Okay, let's start all over.
When I extract keywords from records to build indexes, I do it two ways. I
either take the entire contents of a field and use that as an index term, or
I take the individual words from the field, remember their relative
positions and use them as index terms. I have always called the first type
of indexing "phrase" indexes and the second kind of indexing "word" indexes.
"Phrase" (or "String" now) indexing is extremely useful for browsing. It
allows me to show the users entire titles or names. It has some value for
exact phrase matches. (I want books titled "gone with the wind", not books
with "gone with the wind" anywhere in the title. TitlePhrase (or
TitleString) = "gone with the wind" will not retrieve books titled "gone
with the wind: the dustbowl of the 30's". Phrase indexes are slightly
useful, combined with right-hand truncation, for finding things that must
begin with a particular string.
"Word" indexes are where the searching really happens. When I search for
"with the" as a word search, I am doing a proximity search. This is not an
implicitly left and right truncated search that scans sequentially through
all index terms looking for matches. The terms "with" and "the" are
individually looked up, a Boolean "and" is applied to them and in those
instances where they both occur in the same record, then their positions are
compared and if they are adjacent to each other, then the record is added to
the result set.
Do we understand the difference between word and string indexes now?
> -----Original Message-----
> From: Robert Sanderson [mailto:[log in to unmask]]
> Sent: Thursday, September 19, 2002 7:47 PM
> To: [log in to unmask]
> Subject: Re: Need Feedback on Re: DC Index definitions
> > All, please comment; I don't think we can move
> > > Word indexes are indexes that support implicit
> > > proximity between a provided list of words.
> > Is this what we mean by "word index"??????????
> No this is an unanchored string search or a multiple term proximity
> search, not a word index.
> > Or do we mean an index that supports single-word
> > search, like Bath?
> This is what we agreed in July to my understanding for a word index --
> that you would only send one word per term.
> ,'/:. Rob Sanderson ([log in to unmask])
> ,'-/::::. http://www.o-r-g.org/~azaroth/
> ,'--/::(@)::. Special Collections and Archives, extension 3142
> ,'---/::::::::::. Twin Cathedrals: telnet:
> liverpool.o-r-g.org 7777
> ____/:::::::::::::. WWW:
> I L L U M I N A T I