Hello.
My name is Martin Malmsten. I am implementing an SRU server for The
Royal Library of Sweden, which will probably also be used by the ONE
Association. I'd like to share my thoughts on SRU and CQL, I realize
that there are reasons for some of the things I critize, but I thought
that my first impressions might be valuable since I guess most of you
have implemented and created the protocol in parallell.
Praise:
Thank you, thank you, thank you. I think that SRU might be the killer
protocol that finally makes people outside the library bubble interested
in implementing "our" standards. It might even be possible to hire
people fresh from the university and have them implement it, and not
recieve that glazed over look that they get when you tell them about the
wonders of ASN.1, BER, ISO2709 ... :)
Questions:
* carat/hat in word lists (from the examples)
I would say that 'dc.title any "^cat ^dog eats rat"' matches "cat eats
dog". "cat" and "eats" matches part of the string, "dog" is not in the
beginning, but since the relation is "any" it's still a match. Right?
(But honestly, do we really really need left-anchored words in word
lists, it's neat, but is it necessery?)
* the CQL BNF
According to the current BNF "cat prox/>/2//ordered hat" is not a valid
query since (1) a modifier can not only be "/>" and (2) two consecutive
slashes is not allowed. Is it supposed to be:
modifier ::= '/' modifierName | '/' comparitorSymbol | '/' modifierValue
| '/' modifierName | '/'
Critique:
The problem with Z39.50 was(/is) that you need a lot of toolkits
implementing standards that are not widely used. If you want to write a
server from scratch, it's quite a lot of work. Which is silly for just a
search/retrieve protocol. And that is *before* you start worrying about
profiles and indices. Z39.50 failed to *keep it simple*.
In my opinion the following features add to the the percieved complexity
of the protocol:
* lists
Confusing syntax, sometimes a string is a string, sometimes it's a list
of (unordered) words. I would prefer an explicit list syntax. For
example:
"A B C" --> [A, B, C]
"\"A B\" C" --> ["A B", C]
* encloses and within (and partial)
It seems to me that what you are trying to achieve is to do tests on
server-side n-dimensional objects. Although a geometrical search engine
would be totally awesome, I doubt that it is something that will be
widely implemented (across different communities), and therefore should
not be in the cql context set.
If you want access a two-dimensional object's members, couldn't you just
use different index names? Like this:
A encloses X -> (X >= A::x1 and X <= A::x2)
Again, it's kinda neat, but is it really necessery?
* the "relevant" term funtion
How can the term function order the result set? What happens if two
terms have the "relevant" term function?
* XCQL
Why, oh why is this necessery? If it's only used for debugging, then put
it somewhere else, like in the diagnostics.
* prefix maps
I honestly do not see the need for them. It makes the query harder to
read, and the since the server tells the client what, for example,
"bath" in "bath.title" means there is no need for the client to specify
it. It only makes sense when the server supports more than one context
set with the same name, forcing the client to explicitely choose the one
that is not used by default. Forcing server implementors to *not* use
context sets with the same prefix seems better than forcing everyone to
handle the prefix map syntax.
Regards,
Martin Malmsten
______________________________________________________________________
Martin Malmsten, Systems Architect, Royal Library of Sweden/LIBRIS
[log in to unmask], +46-8-4634258
http://www.libris.kb.se/
|