On Fri, 3 Dec 2004, Hedzer Westra wrote:
>>> - how are words separated? The description hints at splitting on
> (white)space only.
> I looked it up in the HTML documentation:
> CQL tutorial, Section 2:
> [space] (separates words of a CQL expression)
Which is a fine description for an introductory tutorial, but not
complete. Especially as it lacks (as discussed) any mention of relation
modifiers and has the version 1.0 proximity syntax. Implementers might
use the tutorial as a guide, but in the end it's the specification which
matters. (barring thinkos like the lack of / in the special characters)
However, it makes very little difference how the server splits a string
into words so long as it does it consistently.
For example, a server might think that 'middle-age' is two words, yet it
does not have a space. So long as the records have middle-age indexed as
two adjacent words, an adjacency search (foo = "middle-age") will still
work as expected.
Of course an any/all search will not work as expected as it will match
middle and age when they are not adjacent.
> a. operator = with a multi-word term (word separation implementation
> dependent, should be described in the implementation profile) as well as
> cql.all and cql.any operators -> default modifier is cql.word
Yup.
> b. operator cql.exact -> default modifier is cql.string. Question: does
> this refer to
> 1. exact searching w.r.t. splitting of words (which would imply that
> cql.word and cql.string are mutually
> exclusive), or
They are mutually exclusive. A cql.string is an opaque set of characters
that the server should not try to interpret.
> 2. exact searching w.r.t. pattern matching (which would imply that
> cql.masked and cql.string are mutually
> exclusive), or
I believe so. exact is treated as anchored at both ends, and may not
have any masking characters.
=/cql.word is adjacency.
=/cql.string is exact.
> c. operator = with a single term and all other operators -> default
> modifier is cql.masked
Yes.
> d. cql.masked implies ??: cql.word or cql.string or none? Maybe this is
> orthogonal, i.e., cql.masked can be
> supplied *together* with one of the other five (word, string,
> isoDate, number, uri) - assuming b.1. is true.
I think that it only applies by default to word, but that should probably
be further discussed :) For example, I would not want it to be applied to
number, date, or uri.
> But then you'd also need to be able to specify cql.unmasked or
> something to disable pattern matching.
You can escape the pattern characters, or define a new modifier that
overrides the masking -- for example you might want foo.regexp as a
different set of masking rules.
> e. only one of word, string, isoDate, number and uri can be set at the
> same time for one searchClause
Yes.
> = is used for word adjacency, when the term is a list of words. That
> is to say that the words
> appear in that order with no others intervening.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
That's the definition of adjacency.
> Maybe the problem (for me) is that the SRW Base profile doesn't mention
> relation modifiers, default modifiers and modifier behaviour at all. A
> text like 'semantics of (default) relation modifiers is implementation
> dependent and should be further defined in a profile by server
> implementors' would do the trick for me..
That's because the SRW Base Profile doesn't require them to be
implemented.
Rob
,'/:. Dr Robert Sanderson ([log in to unmask])
,'-/::::. http://www.o-r-g.org/~azaroth/
,'--/::(@)::. Dept. of Computer Science, Room 805
,'---/::::::::::. University of Liverpool
____/:::::::::::::. L5R Shop: http://www.cardsnotwords.com/
I L L U M I N A T I
|