Posting for Janifer Gatenby -- LED
-----Original Message-----
>>> The problem I'm having is that I no longer remember how we thought we
>>> could distinguish between "first-characters-in-field" and
>>> "first-words-in-field" searches in Bath. In other words, what would
>>>
>>> Were we specifying the "firstWords" search with proximity?
>>
>> My meeting notes say:
>> Left anchored is assumed.
>> Exact match is assumed
>> Unanchored begins with a question mark
>>
>> My notes do say that our prose would relate different SRW/U queries to
>> Bath searches. We did not want specific Bath elements in the query
>> syntax. I agree with Mike that we should not load the index names.
>
>I agree with a somewhat rambling caveat:
>
> Index Names are supposed to represent the attribute combinations in
> Z39.50. So we can say 'titleWord' not (1=4, 3=3, 4=6)
> So we're already loading the difference between structure 1, completeness
> 1 and structure 3 completeness 6 into 'title' vs 'titleWord'
>
> I think this is an acceptable level of semantics in index names, so long
> as there is a recommendation to use one particular naming scheme (foo,
> fooWord) Obviously this can be ignored, but so can attribute
> combinations, it just means that the searches will produce possibly
> unexpected results due to lousy configuration.
>
> titleFirstWordsIncludingLeadingArticles is a search, not an index.
> 'First' or any other description of the location of the term should be
> part of the query language, not the name of the index. And as we can
> express it with either proximity or truncation, there's no need for it
>to be in the index.
JG: I agree. I don't think that this changes anything though.
> This brings me to my next questions:
>
> 1.
> Why even fooWord? We can express it with an unanchored search with
> spaces.
> eg: foo="? term *"
>
> I don't think that this is sensible, but it's the logical conclusion and
> there needs to be an answer to give to it.
JG: We don't do it this way because it is harder to scan with the eye. I
think that the above would be legal; therefore equivalent to fooword="term"
> The index name describes what is in the index. fooWord contains the
> individual words from 'foo', which implies that the server best knows
how
> to extract a 'word' from its data with respect to which normalisation
and
> extraction routines to use. The search shouldn't have to know which
> normalisation/extraction routines are used, so there needs to be a
'word'
> index. For first words, it does know the extraction routine -- take the
>first words in the field. Right?
>
> 2.
> If someone searches with:
> titleWord="search term"
>
> The agreement was to fail it, if my memory serves? As there's no single
> word which matches 'search term'?
> The search should have been titleWord="search" and titleWord="term" ?
JG: Servers should fail it. I guess some kind servers will correct it and
also titleword="search and term". The latter is trickier because a phrase
may have been meant.
>Rob
--
,'/:. Rob Sanderson ([log in to unmask])
,'-/::::. http://www.o-r-g.org/~azaroth/
,'--/::(@)::. Special Collections and Archives, extension 3142
,'---/::::::::::. Twin Cathedrals: telnet: liverpool.o-r-g.org 7777
____/:::::::::::::. WWW: http://liverpool.o-r-g.org:8000/
I L L U M I N A T I
|