Okay, so I must have missed a decision while I was on vacation.
Are all searches now word proximity searches?
We have a structure attribute in Z39.50. I thought that the definition of
the index would specify its structure attribute. So, if I've declared the
structure of Title to be "String" (avoiding the ambiguous "Phrase"), then
the search Title="How to make money|" is illegal. If I declared the
structure to be "adjacency word list", then that search makes sense. But,
if the structure is "adjacency word list", then TitleWord="How to make
money" is words-anywhere-in-field, not exact-title. The normal way to get
exact title is TitleString (usually just Title) = "How to make money".
If you want to support first-words-in-field in a word-list, then you
definitely need a left-anchor character. So, TitleWords="$How to make
money" says that the word "How" should be at the beginning of the field.
With a right-anchor character, you can do exact-match with word-lists:
TitleWords="$How to make money^". Of course, those anchor characters would
be illegal (or at the very least, redundant) in a String search.
So, could someone please explain to me the current state of the structure
attribute in our indexes and how that relates to the truncation attribute?
> -----Original Message-----
> From: Robert Sanderson [mailto:[log in to unmask]]
> Sent: Thursday, September 05, 2002 8:09 AM
> To: [log in to unmask]
> Subject: Re: which masking character for words?
> On Thu, 5 Sep 2002, Alan Kent wrote:
> > On Wed, Sep 04, 2002 at 11:09:17AM -0400, Ray Denenberg wrote:
> > If I understand things, to find an exact title I would use:
> > Title="How to make money"
> > To do first-words-in-field I would say
> > Title="How to make money|"
> > To do words-anywhere-in-field I would say
> > Title="|How to make money|"
> Yes :)
> > But I would assume the following probably would not be supported by
> > most people (I cannot turn it into an attribute list).
> > Title="|How to make money"
> > That is 'last words in field'. So it may be legal in CQL, but an
> > implementation may choose to say 'sorry, I cannot do that!'.
> Yes. Perhaps some implementations will, and perhaps they won't. It's
> just an unsupported search (though hopefully there's a
> diagnostic to say
> why in particular it's not working)
> > Also, in the spec you posted, it said '|' was one or more
> words. Should
> > this be zero or more words? Otherwise the following means
> words in title
> > If its zero or more words, it makes more sense doesn't it?
> Or is the goal
> Yes. It should be zero or more, I think.
> ,'/:. Rob Sanderson ([log in to unmask])
> ,'-/::::. http://www.o-r-g.org/~azaroth/
> ,'--/::(@)::. Special Collections and Archives, extension 3142
> ,'---/::::::::::. Twin Cathedrals: telnet:
> liverpool.o-r-g.org 7777
> ____/:::::::::::::. WWW:
I L L U M I N A T I