> >The problem I'm having is that I no longer remember how we thought we
> >could distinguish between "first-characters-in-field" and
> >"first-words-in-field" searches in Bath. In other words, what would
> >Were we specifying the "firstWords" search with proximity?
> My meeting notes say:
> Left anchored is assumed.
> Exact match is assumed
> Unanchored begins with a question mark
> My notes do say that our prose would relate different SRW/U queries to Bath
> searches. We did not want specific Bath elements in the query syntax. I
> agree with Mike that we should not load the index names.
I agree with a somewhat rambling caveat:
Index Names are supposed to represent the attribute combinations in
Z39.50. So we can say 'titleWord' not (1=4, 3=3, 4=6)
So we're already loading the difference between structure 1, completeness
1 and structure 3 completeness 6 into 'title' vs 'titleWord'
I think this is an acceptable level of semantics in index names, so long
as there is a recommendation to use one particular naming scheme (foo,
fooWord) Obviously this can be ignored, but so can attribute
combinations, it just means that the searches will produce possibly
unexpected results due to lousy configuration.
titleFirstWordsIncludingLeadingArticles is a search, not an index.
'First' or any other description of the location of the term should be
part of the query language, not the name of the index. And as we can
express it with either proximity or truncation, there's no need for it to
be in the index.
This brings me to my next questions:
1.
Why even fooWord? We can express it with an unanchored search with
spaces.
eg: foo="? term *"
I don't think that this is sensible, but it's the logical conclusion and
there needs to be an answer to give to it.
The index name describes what is in the index. fooWord contains the
individual words from 'foo', which implies that the server best knows how
to extract a 'word' from its data with respect to which normalisation and
extraction routines to use. The search shouldn't have to know which
normalisation/extraction routines are used, so there needs to be a 'word'
index. For first words, it does know the extraction routine -- take the
first words in the field. Right?
2.
If someone searches with:
titleWord="search term"
The agreement was to fail it, if my memory serves? As there's no single
word which matches 'search term'?
The search should have been titleWord="search" and titleWord="term" ?
Rob
--
,'/:. Rob Sanderson ([log in to unmask])
,'-/::::. http://www.o-r-g.org/~azaroth/
,'--/::(@)::. Special Collections and Archives, extension 3142
,'---/::::::::::. Twin Cathedrals: telnet: liverpool.o-r-g.org 7777
____/:::::::::::::. WWW: http://liverpool.o-r-g.org:8000/
I L L U M I N A T I
|