(finally catching up on this...)
A while back, Rob Sanderson and Ray Denenberg had a conversation (see
below) about keyword searching and the usefulness of the proposed
excludeOriginInfo modifier on the cql.anywhere index.
In my opinion, the server should be the one making the decision on which
fields/indexes are inluded in a keyword search. The developer creating
the server has knowledge of the data being searched, and which portions
of the data are useful for keyword searching. In every collection I've
dealt with, we have had a meeting go through the available fields and
determine which were useful as keywords. Sometimes, this inludes
originInfo, but often it does not. I wouldn't want the people posing
queries to make this decision, because they do not have enough knowledge
of the structure of the data.
If cql.anywhere truly means "any field in the data", including fields
that wouldn't normally be searchable (like originInfo and notes), it is
guaranteed to produce a lot of noise. However, the current CQL context
set says it means "search all indexes", which I take to mean "search all
information that would normally be searchable", roughly equivalent with
a keyword search where keywords are defined by the server.
The bottom line, then is that if we define cql.anywhere properly, we
don't need the excludeOriginInfo modifier.
----- Ryan Scherle
----- Digital Library Program
----- Indiana University
----------- Quoted conversation -------------
Rob: I don't see the need for excludeOriginInfo, and would suggest
removing it unless someone can speak as to its utility.
Ray: The idea is that there may be several different types of keywords
defined. One such keyword would be defined as "anywhere in the
record with the exception of OriginInfo" and someone else could define a
exclude some other area of the record. originInfo is the only one
so far. The suggester posed this requirement, that excluding originInfo
dramatically improves precision of the search because so much
irrelevant junk is contained there.