"LeVan,Ralph" wrote:
> We have a structure attribute in Z39.50. I thought that the definition of
> the index would specify its structure attribute. So, if I've declared the
> structure of Title to be "String" (avoiding the ambiguous "Phrase"), then
> the search Title="How to make money|" is illegal. If I declared the
> structure to be "adjacency word list", then that search makes sense. But,
> if the structure is "adjacency word list", then TitleWord="How to make
> money" is words-anywhere-in-field, not exact-title. The normal way to get
> exact title is TitleString (usually just Title) = "How to make money".
>
> If you want to support first-words-in-field in a word-list, then you
> definitely need a left-anchor character. So, TitleWords="$How to make
> money" says that the word "How" should be at the beginning of the field.
> With a right-anchor character, you can do exact-match with word-lists:
> TitleWords="$How to make money^". Of course, those anchor characters would
> be illegal (or at the very least, redundant) in a String search.
>
> So, could someone please explain to me the current state of the structure
> attribute in our indexes and how that relates to the truncation attribute?
I'll try. I think we've decoupled the structure attribute from srw indexes.
The confusion may be due in part to the fact that the index definitions haven't
been updated corresponding to the new proposal, and this is because it's not
entirely clear yet that the proposal will be adopted. And I think it would be a
good idea for me to update the index definitions as soon as possible to bring
them in synch with the current thinking, so I propose we have some focused
discussion to ascertain what the current thinking is. Following is my
impression, based on discussion over the last week or so.
We don't need four Author (Title, etc) indexes for Bath, one is sufficient,
Bath.author. Similarly we don't need two for DC, dc.author is sufficient. (And
if we have bath.author and dc.author, then leave aside for now the difference
between the two.)
So instead of bath.authorWord, search on bath.author, where term is "|word|"
Instead of bath.authorFirstWords, search on bath.author where term is "word1
word 2 ... wordN | "
Instead of bath.authorFirstCharacters search on bath.author where term is
"string*"
Instead of bath.authorExact search on author where term is "string"
Please let's clear this up. If anyone doesn't think that we've moved towards
this, speak up fast.
Aside from this, Ralph appears to be throwing in an additional issue,
suggesting that we need anchor characters, rather than masking at the beginning
or end of a term. (Thus the default would be unanchored as opposed to the
masking proposal where the default is anchored.) I hope we don't need to go
down this path too.
--Ray
|