Hi Mike & Rob (and potentially other interested human beings),
On Fri, 3 Dec 2004, Hedzer Westra wrote:
>>>> - how are words separated? The description hints at splitting on
(white)space only.
>> CQL tutorial, Section 2
[Rob]
> Which is a fine description for an introductory tutorial, but not
complete. Especially as it lacks (as
> discussed) any mention of relation modifiers and has the version 1.0
proximity syntax. Implementers might use
> the tutorial as a guide, but in the end it's the specification which
matters.
Okay then, I'll just ignore what I read there then :-)
[Mike]
>> [space] (separates words of a CQL expression)
>Yes; but this refers to the words that make up the entire query, not
those embedded within a term. So this is
> talking about breaking up the query
> dc.author any "kernighan ritchie"
>into the three tokens
> index-name: dc.author
> relation: any
> term: "kernighan ritchie"
>and not at all about how that term "kernighan ritchie" is to be
interpreted.
Ah oww. Sorry 'bout that mixup.. My bad.
[Rob]
> However, it makes very little difference how the server splits a
string into words so long as it does it
> consistently.
I'll make sure our server does that.
Here are Rob and Mike's replies to my implied modifier behaviour:
>> a. operator = with a multi-word term (word separation implementation
>> dependent, should be described in the implementation profile) as well
>> as cql.all and cql.any operators -> default modifier is cql.word
[Rob]
>Yup.
>> b. operator cql.exact -> default modifier is cql.string.
[Mike]
> Well. We're not talking about it in these terms. To say "default
modifier" is misleading as there may
> legitimately be zero, one or more modifiers on a relation. But, yet,
the term _structure_ implied by the
> cql.exact relation is indeed "string".
Noted.
>> Question: does this refer to
>> 1. exact searching w.r.t. splitting of words (which would imply
that cql.word and cql.string are mutually exclusive)
[Mike]
> Yes, they are. String vs. Words is a fundamental dichotomy that we've
thrashed out neatly on this list and
> which should be described in both the official documentation and the
tutorial.
[Rob]
>They are mutually exclusive. A cql.string is an opaque set of
characters that the server should not try to
>interpret.
Good, those two answers both say the same thing ;-)
>> 2. exact searching w.r.t. pattern matching (which would imply that
cql.masked and cql.string are mutually exclusive)
[Rob]
>I believe so. exact is treated as anchored at both ends, and may not
have any masking characters.
>=/cql.word is adjacency.
>=/cql.string is exact.
[Mike]
> No, a masked string is just fine. (Why would we prohibit such a useful
thing?)
> dc.title exact "the adventures of *"
> will find
> The Adventures of Hulk
> The Adventures of Baron Munchausen
> The Adventures of the Famous Five
> but _not_
> The Amazing Adventures of Captain Gladys Stoatpamphlet and her
Intrepid Spaniel Stig.
> because the extra word "amazing" breaks the "exact" condition.
Hmm, this is something different. I'm up for Rob's description if nobody
minds. This is coincidentally the way I've implemented it
already :-)
>> c. operator = with a single term and all other operators -> default
modifier is cql.masked
[Mike]
> The masked-vs.-unmasked dichotomy is orthogonal to string-vs.-words.
Good!
[Rob]
> Yes.
>> d. cql.masked implies ??: cql.word or cql.string or none? Maybe this
>> is orthogonal, i.e., cql.masked can be
>> supplied *together* with one of the other five (word, string,
>> isoDate, number, uri) - assuming b.1. is true.
[Rob]
>I think that it only applies by default to word, but that should
probably be further discussed :) For example, I would not want it to be
>applied to number, date, or uri.
Makes sense.
>> But then you'd also need to be able to specify cql.unmasked or
something to disable pattern matching.
[Mike]
>Yes; there should be a cql.unmasked relation modifier.
[Rob]
>You can escape the pattern characters, or define a new modifier that
overrides the masking -- for example you might want foo.regexp as a
>different set of masking rules.
If I understand it correctly Mike suggests to extend the CQL context
set, and Rob suggests to define it in our own context set. Let's have
the SRW 1.2 people decide upon this!
>> e. only one of word, string, isoDate, number and uri can be set at
the same time for one searchClause
[Rob]
>Yes.
[Mike]
>Correct, because these particular modifiers all represent alternative
points along the same axis.
Same answer: good!
>> - why is sorting defined on XPaths?
[Rob]
> Mostly because it was an easy, existing specification to use to
specify a path to some data in a structured
> document. This doesn't mean that the server has to actually -do-
XPath, just that it should accept them and
> respond appropriately. For example, if you can sort by exact title,
author and date, then you might hard wire
> /record/title, /record/creator and /record/date to these sort
routines. Then you could just respond with
> unsupported tag path to all other requests.
I've done this indeed. Too bad there isn't a separate spec for sorting
on context set indexes.
>> - is there an Open Source SRU/SRW tester (like
>> http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai for OAI) or a
[Rob]>Not yet, but it's on my list of things to do.
See below:
[Mike]
>>> I see that Marc has already answered your questions about open
source clients.
>> Did he?
>Yes. He recommended the fine YAZ command-line client ("yaz-client")
for SRW, and the web-browser of your choice, >or wget, for SRU.
Very good! I guess his e-mail got lost in my spam filter, I retrieved
the msg from the archive and got CQLJava which contained a set of XSLs
which turn IE into a SRU browser. Needed a few tweaks, but works great!
I didn't implemented SRW, so there was no need for yaz.
BTW I didn't try creating a unit testing program yet, but I expect it to
be quite simple; Marc sent me a Unix shell script that will do that. I
didn't test it on my Cygwin yet.
>> - the ZeeRex documentation is a bit concise on configInfo. What
>> settings exactly are 'setting', 'default' and 'supports'?
[Rob]
> setting: Something which cannot be changed. You might have a setting
of 'maximumRecords' -- the maximum
> records you can retrieve at once.
> default: Something which can be changed but has a default value. For
example 'retrievalSchema' -- the default
> schema you'll get your records in unless you specify one.
> supports: Some feature of the protocol which the server supports.
For example sorting, proximity of the scan
> operation.
Yes I understand that from the description, I just hoped for a
definitive list saying which configInfo @type needs which element. Now
I've just guessed those, I hope SRU client implementors made the same
guesses. BTW: if this is implementation dependent I would have chosen to
set those three values as attribute, not as element name. But that's
another discussion..
Best regards,
Hedzer Westra, Systems Developer
Adlib | Information Systems
Reactorweg 291
3542 AD Utrecht
Postbus 1436
3600 BK Maarssen
tel: +31-30-241 1885
www: http://www.adlibsoft.com
|