> Date: Mon, 13 Dec 2004 22:01:57 +0000
> From: Dr Robert Sanderson <[log in to unmask]>
> These exceptions are why we need the word/string distinction, and
> why it's important that the term in the query is processed the same
> as the data from the records.
Absolutely right. We do need some exposition of the word/string
distinction somewhere in the documentation, don't we?
>> One can imagine situations where you'd want to search only for an
>> exact, complete, URI, and others where you want to do keyword
>> searching on the URI (e.g. to discover all the URIs from a
>> specified domain).
> Wouldn't that be pattern matching, rather than keyword?
Not necessarily -- both make sense, and I might want to implement one
rather than (or as well as) the other.
> Secondly, if you do want to split URIs up by punctuation, you would
> search as a keyword in an index of uris, not as a combined uri/word?
Ahhhh ... I think you've got something there. An index that
_contains_ URIs -- for any kind of searching, string or word -- is not
necessarily related to the _term-structure_ URI.
But it then follows that URIs are search-compatible with either string
or word searching. So they can't be a subclass of either.
Conclusion: the URIness or otherwise of a term's structure tells us
nothing about whether that term is to be interpreted as a string or a
term. URIness is orthogonal.
Corollary: URIness may actually tell us _nothing_ useful about the
term at all.
Can anyone postulate a situation in which a server might run different
code for a query that has a /cql.uri relation-modifier than for one
that does not?
> > "exact" induces the "word" structure (unless overridden by an
> > explicit relation modifier). Similarly, "=" induces the "string"
> > structure (unless overridden by an explicit relation modifier).
> Other way round, Mike :) Exact has a default of string, = is word
Arrgh! Arrgh! Mea culpa. Rob is exactly right.
> [...] unless the server thinks that it should be numeric
> equality. (eg if the term is numeric and the index is numeric)
Eh? I certainly don't remember agreeing this. It seems dangerously
error-prone to me. I don't think the server can recognise what is and
isn't a "number" lexicographically.
So back to my question:
> > A better question would be this: what structure should "<" and the
> > other inequality relations induce on their terms?
Let me re-state it this way: when I search for
foo < fruit
is "fruit" to be interpreted as a string or a word?
> It's more that I in fact think the sort definition needs completely
> overhauling, but that's a lot of stuff to think about right before
> the holidays and didn't think that anyone else would care :)
I agree with you both that it needs reworking, and also that there's
not much point starting that process at this stage. Let's wait until
/o ) \/ Mike Taylor <[log in to unmask]> http://www.miketaylor.org.uk
)_v__/\ "Things happen to you, and you don't know why. Your mental
model is a mess because you are trying to model a mess" --
Jakob Nielsen on the parlous ubiquity of buggy software.
Listen to free demos of soundtrack music for film, TV and radio