I propose the following for the cql relation.
Having read everyone's position, I know this won't
completely satisfy anyone, but it seems a
reasonable compromise to me.
dc.title matches (string) means exact match
dc.title = (word1 word2 word3) means
adjacent words
dc.title ~ (word1 word2 word3) means
similar words
dc.title * (word1 word2 word3) means all
words
dc.title + (word1 word2 word3) means any
words
dc.title stem (word1 word2 word3) means stem/any
words
dc.title fuzzy (word1 word2 word3) means
fuzzy/any words
Points
1. There's no reason to say, e.g.:
"dc.title =~ (word1 word2)" when
"dc.title ~ (word1 word2)" will do just fine.
That should solve the problem of what does >~
mean.
2. Further on the equal-sign, that the current cql
definintion says:
relation ::= numeric-relation|"fuzzy"|
"stem"|"relevance"
numeric-relation::="<"|">"|"<="|">="|"="|"<>"
By this definition "=" is a numeric relation. I
strongly feel that it should either be a numeric
relation or used for strings/words, but not both.
Above it's the latter, which would mean take "="
out of the mathematical relation list. Or can
someone give an example where we need mathematical
equality? If so then we should come with an
alternative symbol for "=" for word adjacency.
3. Implicit in this proposal is that we don't have
separate (abstract) word and string index names.
i.e. just dc.title, not dc.titleWord and
dc.titleString. This would apply to all dc
elements. Bath would remain as defined.
4. For stem and fuzzy, I don't know if it should
be any or all (pick one). If it's any, then if you
want to do all you have to construct booleans.
5.
Robert Sanderson wrote:
> Designing CQL such that it maps onto attributes
> rather than being easy to
> understand and construct is counter productive
> IMO.
I see no reason why we can't do both, giving
priority to the latter, and modifying AA if we
find it necessary. For example I don't believe
we need to tokenize an anchor (i.e. take it out of
the term) to be able to cite how it maps to
Z39.50. We simply need to have rules for turning
a cql query into a Z39.50 query. One of those
rules could be "if there is a left-anchor
character at the beginning of the field, turn that
into first-in-field".
--Ray
|