Hello all,
Mike Taylor answered some of my questions about the CQL/SRW
implementation details I sent in my previous message, here is my reply,
along with some more questions.
>> - how are words separated? The description hints at splitting on
(white)space only.
I looked it up in the HTML documentation:
CQL tutorial, Section 2:
[space] (separates words of a CQL expression)
..
In general, multi-word terms are interpreted as requesting records in
which a single field contains all the
specified words, in the specified order, with no other words in
between. This is a proximity search.
CQL tutorial, Section 5:
For word indexes, three more relations are supported:
any The search succeeds if one or more of the words in the term can
be found in the record. So, for example,
title any "ocean sea lake" is a convenient shorthand for
title="ocean" or title="sea" or title="lake".
all The search succeeds if every one of the words in the term can be
found in the record. So, for example,
title all "old man sea" is a convenient shorthand for
title="old" and title="man" and title="sea".
exact A query like title exact "the complete dinosaur" indicates a
string search rather than a word search. It
succeeds only for records whose title field consists exactly of
the characters ``the complete dinosaur''.
BTW: section 4 of the CQL tutorial doesn't mention cql.anywhere
>I think you're talking about the words _within_ a single,
multiple-word, search term, right? A query like:
> dc.creator all "kernighan ritchie"
Yes.
>The CQL specification itself says _nothing_ on how such strings as
"kernighan ritchie" above should be broken
> down into individual tokens. This is a matter for application
profiles.
You're right, indeed the CQL context set description says:
"The term should be broken into words, according to the server's
definition of a 'word'"
but the CQL tutorial says something else and I'm afraid that people
writing client code will assume it's part of the default CQL semantics
and not (as I understand now) implementation (i.e. profile) dependent.
Perhaps that might be mentioned in the tutorial and CQL language
description?
> The way the term should be treated is specified by a "structure
attribute" -- that is, a special relation
> modifier that, although it is physically attached to the relation
actually talks about the structure of the term
> that the relation relates. One of these is cql.word.
> The phrasing around these modifiers is a little vague, so it's not
made explicit that this also applies in the
> case of the default structure, cql.masked; but I believe that this is
what people intended.
I'd like to have that documented as well if possible to prevent more
confusion in the future. My analysis of the
description:
a. operator = with a multi-word term (word separation implementation
dependent, should be described in the implementation profile) as well as
cql.all and cql.any operators -> default modifier is cql.word
b. operator cql.exact -> default modifier is cql.string. Question: does
this refer to
1. exact searching w.r.t. splitting of words (which would imply that
cql.word and cql.string are mutually
exclusive), or
2. exact searching w.r.t. pattern matching (which would imply that
cql.masked and cql.string are mutually
exclusive), or
3. both?
c. operator = with a single term and all other operators -> default
modifier is cql.masked
d. cql.masked implies ??: cql.word or cql.string or none? Maybe this is
orthogonal, i.e., cql.masked can be
supplied *together* with one of the other five (word, string,
isoDate, number, uri) - assuming b.1. is true.
But then you'd also need to be able to specify cql.unmasked or
something to disable pattern matching.
e. only one of word, string, isoDate, number and uri can be set at the
same time for one searchClause
(unless b.2. is true..)
So: the whole word & pattern matching thing is a bit unclear to me.
Please help me clear this up. If this is intentionally undefined (read:
vague) please say so in the description, so that server & client
implementers know they have to make this definite in their profile
description.
> The upshot is that your server is at liberty to break up multi-word
terms however it likes.
> You should probably use whatever your server already uses.
I'll do that, then..
>> - is word proximity search required even for basic searches like
>> 'author = "Rembrandt van Rijn"' ?
> The SRW/U "base profile" includes specifications for the minimum level
of CQL support.
--- 8< ---
> So, no, you are not obliged to implement proximity.
But:
CQL tutorial, Section 2:
In general, multi-word terms are interpreted as requesting records in
which a single field contains all the
specified words, in the specified order, with no other words in
between. This is a proximity search.
CQL context set
(http://www.loc.gov/z3950/agency/zing/cql/context-sets/cql.html):
= is used for word adjacency, when the term is a list of words. That
is to say that the words
appear in that order with no others intervening.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If this is a mistake or a just possible implementation? Either way,
please clear it up in both documents.
> Since CQL has its own existence independent of SRW, I think these
base-profile requirements ought perhaps to be
> in a separate CQL base-profile document instead of, or as well as, in
the SRW base profile.
Maybe the problem (for me) is that the SRW Base profile doesn't mention
relation modifiers, default modifiers and modifier behaviour at all. A
text like 'semantics of (default) relation modifiers is implementation
dependent and should be further defined in a profile by server
implementors' would do the trick for me..
>> I will pass on your XPath-sorting question
Just as a reminder, here it is again:
> why is sorting defined on XPaths? This completely bypasses any
possible built-in sorting capability of any
> database server except one delivering XML records natively. What I
expect to have to do is:
> + retrieve every record
> + convert them to adlibXML
> + translate to the right schema using an XSL
> + call the XPath on each record
> + sort on the outcome if this XPath
> + print the records in the resulting order (maybe even convert them
again if the retrieveSchema is different)
> .. pretty slow! Would it be possible within the SRW standard to just
supply an indexname (which in Adlib as well
> as in relational databases systems is searchable as well as
sortable..) in stead of the XPath?
>> I see that Marc has already answered your questions about open source
clients.
Did he? I didn't see that e-mail come by yet.. My question was:
> is there an Open Source SRU/SRW tester or a client that can be used to
test our implementation? I only found
> servers on the SRW website.
>> I will leave Rob to answer the ZeeRex questions
Just as a reminder, here they are again:
> the ZeeRex documentation is a bit concise on configInfo. What settings
exactly are 'setting', 'default' and
> 'supports'? And why would you want to have a default stylesheet? That
disables XML retrieval altogether, or do I
> misinterpret something?
Many thanks in advance for your time & answers!
Best regards,
Hedzer Westra, Systems Developer
Adlib | Information Systems
Reactorweg 291
3542 AD Utrecht
Postbus 1436
3600 BK Maarssen
tel: +31-30-241 1885
www: http://www.adlibsoft.com
|