> Date: Mon, 13 Dec 2004 14:45:40 +0100
> From: Hedzer Westra <[log in to unmask]>
> Then, is there any difference between
> a. idx = term1, and
> b. idx =/cql.string term1
> where term1 only contains 1 word

Yes, absolutely!  The former would find records where idx has the
value "term0 term1 term2", but the latter wouldn't.

Or did you mean to ask about the situtation where the _record_ only
contains one word (in the field that's indexed by idx)?  If so, then,
roughly, no: there's no difference.  Although one could imagine
servers that treat them differently if, for example, the one-word
field has whitespace at the start or the end, or funny punctuation, or
something like that.

> [Is there any difference between]
> a. idx = term2, and
> b. idx =/cql.word term2
> where term2 contains multiple words.

None at all: recall that the "word" structure is the default for the
relation "=", so specifying "/cql.word" on this relation is
redundant.  Same applies for "exact/cql.string": the modifier there is
also redundant, and for the same reason.

To say the same thing another way:
        "=" means "=/cql.word"
        "exact" means "exact/cql.string"
which means that you could (if you perverse) use:
        "exact/cql.word" when you mean "="
        "=/cql.string" when you mean "exact"

(Actually, the latter of these is not particularly perverse.  Another
way to think about this is that CQL has only one core equality
relation, "=", which does word matching, but also provides "exact" as
a convenient shorthand for "=/cql.string".)

And remember: all of this is entirely to do with the interpretation of
the term _structure_.  It's orthogonal to the issue of whether pattern
matching is done (and, if so, what kind).

>> A cql.string is an opaque set of characters that the server should
>> not try to interpret.
> Does this *only* refer to word separation, and nothing else?


> The context set currently defines five 'data types' (word, string,
> number, isoDate, uri). Should all terms be assigned exactly one of
> those?

Hoo, that's a tricky one!

I offer this the following "answer" for discussion, not as a
definitive statement: I think the way to think about this is that
"string" and "word" structures are fundamentally different from each
other in that the former should not be broken into words, and the
latter should.  The others seems to me to be either subtypes, or
orthogal to this key dichotomy.  More likely the latter: one can
imagine situations where you'd want to search only for an exact,
complete, URI, and others where you want to do keyword searching on
the URI (e.g. to discover all the URIs from a specified domain).

> Is there a distinction between terms that are *not* assigned any
> type (either in the search query or by the server), and terms that
> are typed 'string' (except for multi-word '=' searches without any
> modifiers?)

"exact" induces the "word" structure (unless overridden by an explicit
relation modifier).  Similarly, "=" induces the "string" structure
(unless overridden by an explicit relation modifier).  A better
question would be this: what structure should "<" and the other
inequality relations induce on their terms?

>>> Too bad there isn't a separate spec for sorting on context set
>>> indexes.
>> That way, Z39.50 lies :-)
> You mean SRW being Z39.50 all over - something like a bulky,
> difficult to implement protocol?

Well, I reject that description of Z39.50.  But what I meant was the
one of the ways in which Z39.50 is perceived to have failed is in
allowing lots of different ways to express things.  SRW and CQL try on
the whole to give you just One True Way.  At present, for specifying
sort keys, that's XPath; but for whatever it may be worth, I share
your disquiet about that choice.  (It's stupid that you can find
records matching "author=lewis" without needing to know where the
"author" field is in the XML records, but you can't the sort that set
on title without knowing where the "title" field is.)

>> To expand upon Mike's typical one-liner, the problem is that then
>> you have to include the entire search clause
> What do you mean by that? I hardly know anything 'bout Z39.50, maybe
> that doesn't help here..

Then just forget it -- really.  That comment was really just a
throwaway for other Z39.50 holdovers such as myself.  If you're new to
this stuff, do yourself a favour and just think about SRW and CQL.

> > (or attribute combination for Z) and the only thing that you can
> > search by are indexes, rather than relatively arbitrary data.  I'm
> > not (personally) averse to reworking the sort definition for 1.2,
> > so if you have any concrete ideas, put them forwards :)
> The only thing that comes to my mind right now is starting the
> sortXPath with an escaping character (preferably an XPath-illegal
> char, making the distinction clear) and then follow with an index
> name.

What you're trying to do here makes sense, but the _way_ your
suggesting here seems unnecessarily hacky.  I think we can do better.

I have a thought on this, but will float it in a separate message so
as to avoid thread-congestion.

>>> I retrieved the msg from the archive and got CQLJava which
>>> contained a set of XSLs which turn IE into a SRU browser.
>> CQL-Java contains that?  Really? I would actually like to find
>> these XSLTs.  Where did you get them?
> I retrieved a ZIP from the OCLC website (don't know the location
> anymore) with a lot of JAR and Java files, and some XSLs in the
> basedir.

Ha!  Ralph, is OCLC distributing a derived work of CQL-Java now?

(It's perfectly entitled to do so, of course, but it would have been
nice to know.)

> See the attachment for my updated XSLs.


 _/|_    _______________________________________________________________
/o ) \/  Mike Taylor  <[log in to unmask]>
)_v__/\  "``user-friendliness'' is over-rated.  If you're not willing
         to learn anything new, you can never use the computer to
         its full potential" -- attributed to Douglas Egglebart,
         inventor of the GUI.

Listen to free demos of soundtrack music for film, TV and radio