On Wed, Sep 25, 2002 at 10:54:41AM -0400, Ray Denenberg wrote:
> =* all of these words
> =? any of these words
> =. adjacent words
> =~ relevant words
> == exact string match
> gives us a much cleaner mapping to AA, and means we don't need separate
> word and string indexes.
> Anyone else support this?
I can understand the niceness in mapping to AA, but I am wondering how
easy it will be for users to remember what all the symbols mean.
I think making the least typing do the most commonly expected thing is
good. I don't think the goal should be google-like, only because its
not going to be completely google-like anyway with AND, OR, NOT, fields,
etc. So let applications turn google-like queries into CQL if they want to.
So based on my personal subjective opinion of what most people would
do most of the time (making it a little harder to do less common things),
my favorite mix this morning is:
- if you want all words in field, use AND-ed conditions
- if you want any words in field, use OR-ed conditions
- several words in a row means adjaceny, unanchored
- use a symbol such as '|' to anchor at left or right (not suggesting
what symbol to use, just that a symbol is reasonable)
Note: for AND, I am not sure of operator precedence. In CCL
title=john and smith
needs to be entered as
title=(john and smith)
as by default it means
(title=john) and (smith)
We could change this maybe as I think the former is more likely what
people want to do. Does make parsing a little harder though.
For string search, I think a different field name can be used. I don't
think it would be common that a field would be both word indexed and
string indexed. So if the field is string, use that attribute by default.
Also note, for '|' (or whatever character) being an anchor, I don't mean
this should be in the Z39.50 pattern operator. I think the CQL parser
should treat these as modifications to the attribute list
title="dog" any where in field
title="|dog" means anchored at beginning - put the firstWordsInField
attribute on, don't send the '|' through to the server,
title="|dog*" means firstWordsInField and send 'dog*' through with
the new masking attribute on too.
Relevance? Hmmm. Not sure what syntax to suggest.
For proximity, if we wanted to avoid attributes, "john smith" could always
be sent through using the PROX operator (not an attribute list). Or it
could be up to a CQL implementation to choose whether to go for an
attribute list (eg: it could try to optimize 'title=john and smith' into
'allWordsInField: john smith' - the CQL syntax does not have to be
exactly the same as the attribute lists etc).
I guess its the good old problem of do you try and do something fully
functional with consistent syntax for all different things possible,
or do you make common things easy and less common things harder.