So, how am I supposed to know if a string index is to be searched or a word
index? You seem to be inferring it from the truncation characters. Is
Title="word" a word search or a complete title search?
Just as importantly, when we move to support Scan, we aren't going to have
any truncation characters to help. So when I scan on Title="word", do I
expect to see single words from titles come back or entire titles?
I don't think we can just lump everything into a single title index.
Ralph
> -----Original Message-----
> From: Ray Denenberg [mailto:[log in to unmask]]
> Sent: Friday, September 06, 2002 11:26 AM
> To: [log in to unmask]
> Subject: indexes and masking
>
>
> "LeVan,Ralph" wrote:
>
> > We have a structure attribute in Z39.50. I thought that
> the definition of
> > the index would specify its structure attribute. So, if
> I've declared the
> > structure of Title to be "String" (avoiding the ambiguous
> "Phrase"), then
> > the search Title="How to make money|" is illegal. If I declared the
> > structure to be "adjacency word list", then that search
> makes sense. But,
> > if the structure is "adjacency word list", then
> TitleWord="How to make
> > money" is words-anywhere-in-field, not exact-title. The
> normal way to get
> > exact title is TitleString (usually just Title) = "How to
> make money".
> >
> > If you want to support first-words-in-field in a word-list, then you
> > definitely need a left-anchor character. So,
> TitleWords="$How to make
> > money" says that the word "How" should be at the beginning
> of the field.
> > With a right-anchor character, you can do exact-match with
> word-lists:
> > TitleWords="$How to make money^". Of course, those anchor
> characters would
> > be illegal (or at the very least, redundant) in a String search.
> >
> > So, could someone please explain to me the current state of
> the structure
> > attribute in our indexes and how that relates to the
> truncation attribute?
>
> I'll try. I think we've decoupled the structure attribute
> from srw indexes.
>
> The confusion may be due in part to the fact that the index
> definitions haven't
> been updated corresponding to the new proposal, and this is
> because it's not
> entirely clear yet that the proposal will be adopted. And I
> think it would be a
> good idea for me to update the index definitions as soon as
> possible to bring
> them in synch with the current thinking, so I propose we have
> some focused
> discussion to ascertain what the current thinking is. Following is my
> impression, based on discussion over the last week or so.
>
> We don't need four Author (Title, etc) indexes for Bath, one
> is sufficient,
> Bath.author. Similarly we don't need two for DC, dc.author is
> sufficient. (And
> if we have bath.author and dc.author, then leave aside for
> now the difference
> between the two.)
>
> So instead of bath.authorWord, search on bath.author, where
> term is "|word|"
> Instead of bath.authorFirstWords, search on bath.author
> where term is "word1
> word 2 ... wordN | "
> Instead of bath.authorFirstCharacters search on bath.author
> where term is
> "string*"
> Instead of bath.authorExact search on author where term is "string"
>
> Please let's clear this up. If anyone doesn't think that
> we've moved towards
> this, speak up fast.
>
>
> Aside from this, Ralph appears to be throwing in an additional issue,
> suggesting that we need anchor characters, rather than
> masking at the beginning
> or end of a term. (Thus the default would be unanchored as
> opposed to the
> masking proposal where the default is anchored.) I hope we
> don't need to go
> down this path too.
>
> --Ray
>
|