Ralph wrote:
<snip>
>
> Do we understand the difference between word and string indexes now?
>
I agree 100% and completely with Ralphs summary.
The bottom line is that string indexes are "card catalog" style indexes.
They work very well for alphabetical browse list but are mostly not very
well suited for other use.
An other point is that sometimes "string" indexes happen to work well. It is
not friendly to throw them away altogether.
Mike wrote:
<snip>
>Indeed -- people who don't already know what "titleWord" means are not
>going to be able to find out by reading this document.
I agree but at the same time the document is probably a great help in case
SRU is mapped to Z39.50, but otherwise.
I think I know what "word" and "string" mean. As I have no Z39.50 layer to
deal with they are trivial to map to normal current indexes.
title string:
step 1: normalize query to index format
step 2: look for a perfect match (respecting wild cards)
titleWord string:
step 1: normalize the query
step 2: identify the words, deal with special cases, ignore words not
indexed.
step 3: look up the words in the index (respecting wild cards)
step 4: return the records containing the words in the right order.
As soon as we overspecify things it gets very complicated. I would simply
state the query "a b c" returns all records containing the string "a b c"
and as few as possible other records. If need to be the user can make the
query very specific.
Rob Koopman
|