On Mon, 13 Dec 2004, Mike Taylor wrote:
>> From: Hedzer Westra <[log in to unmask]>
>> Then, is there any difference between
>> a. idx = term1, and
>> b. idx =/cql.string term1
>> where term1 only contains 1 word
>
> Yes, absolutely! The former would find records where idx has the
> value "term0 term1 term2", but the latter wouldn't.
Because string (and exact) assume that the term is anchored at both ends.
foo =/string "* word *"
foo =/word "word"
Would work the same, so long as the server splits words only by
whitespace, and 'word' doesn't appear with following punctuation. (etcetc)
These exceptions are why we need the word/string distinction, and why it's
important that the term in the query is processed the same as the data
from the records.
>> The context set currently defines five 'data types' (word, string,
>> number, isoDate, uri). Should all terms be assigned exactly one of
>> those?
>
> Hoo, that's a tricky one!
>
> I offer this the following "answer" for discussion, not as a
> definitive statement: I think the way to think about this is that
> "string" and "word" structures are fundamentally different from each
> other in that the former should not be broken into words, and the
> latter should. The others seems to me to be either subtypes, or
> orthogal to this key dichotomy. More likely the latter: one can
> imagine situations where you'd want to search only for an exact,
> complete, URI, and others where you want to do keyword searching on
> the URI (e.g. to discover all the URIs from a specified domain).
Wouldn't that be pattern matching, rather than keyword? Unless you want
to split URIs up by / and then find foo in http://a.b.c/foo/bar ?
Secondly, if you do want to split URIs up by punctuation, you would search
as a keyword in an index of uris, not as a combined uri/word?
I think that they're all mutually exclusive, but that string is subclassed
into URI. (everything that is true of string is true of URI, plus it has
a defined format)
>> Is there a distinction between terms that are *not* assigned any
>> type (either in the search query or by the server), and terms that
>> are typed 'string' (except for multi-word '=' searches without any
>> modifiers?)
>
> "exact" induces the "word" structure (unless overridden by an explicit
> relation modifier). Similarly, "=" induces the "string" structure
> (unless overridden by an explicit relation modifier). A better
> question would be this: what structure should "<" and the other
> inequality relations induce on their terms?
Other way round, Mike :) Exact has a default of string, = is word unless
the server thinks that it should be numeric equality. (eg if the term is
numeric and the index is numeric)
>>> To expand upon Mike's typical one-liner, the problem is that then
>>> you have to include the entire search clause
>>> (or attribute combination for Z) and the only thing that you can
>>> search by are indexes, rather than relatively arbitrary data. I'm
>>> not (personally) averse to reworking the sort definition for 1.2,
>>> so if you have any concrete ideas, put them forwards :)
I think this is worth expanding upon, in light of the above discussion.
If you want to sort by a numeric index alphabetically, rather than
numerically, then you need to know more than just the index, you need to
have the index and relation/relationModifier.
Of course, we can't do that now either, and we have problems with
namespace resolution in the XPath. So to be a little less vague, when I
say that I'm not averse, it's more that I in fact think the sort
definition needs completely overhauling, but that's a lot of stuff to
think about right before the holidays and didn't think that anyone else
would care :)
Rob
,'/:. Dr Robert Sanderson ([log in to unmask])
,'-/::::. http://www.o-r-g.org/~azaroth/
,'--/::(@)::. Dept. of Computer Science, Room 805
,'---/::::::::::. University of Liverpool
____/:::::::::::::. L5R Shop: http://www.cardsnotwords.com/
I L L U M I N A T I
|