LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


ZNG@C4VLPLISTSERV01.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  December 2004

ZNG December 2004

Subject:

CQL implementation details

From:

Hedzer Westra <[log in to unmask]>

Reply-To:

Z39.50 Next-Generation Initiative

Date:

Fri, 3 Dec 2004 14:31:42 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (177 lines)

Hello all,

Mike Taylor answered some of my questions about the CQL/SRW
implementation details I sent in my previous message, here is my reply,
along with some more questions.

>> - how are words separated? The description hints at splitting on
(white)space only. 
I looked it up in the HTML documentation:
 CQL tutorial, Section 2: 
  [space] (separates words of a CQL expression) 
  ..
  In general, multi-word terms are interpreted as requesting records in
which a single field contains all the
  specified words, in the specified order, with no other words in
between. This is a proximity search. 
 CQL tutorial, Section 5: 
  For word indexes, three more relations are supported: 
  any   The search succeeds if one or more of the words in the term can
be found in the record. So, for example,
         title any "ocean sea lake" is a convenient shorthand for
title="ocean" or title="sea" or title="lake". 
  all   The search succeeds if every one of the words in the term can be
found in the record. So, for example,
         title all "old man sea" is a convenient shorthand for
title="old" and title="man" and title="sea".
  exact A query like title exact "the complete dinosaur" indicates a
string search rather than a word search. It
         succeeds only for records whose title field consists exactly of
the characters ``the complete dinosaur''.

BTW: section 4 of the CQL tutorial doesn't mention cql.anywhere

>I think you're talking about the words _within_ a single,
multiple-word, search term, right?  A query like:
>        dc.creator all "kernighan ritchie"
Yes.

>The CQL specification itself says _nothing_ on how such strings as
"kernighan ritchie" above should be broken
> down into individual tokens. This is a matter for application
profiles.
You're right, indeed the CQL context set description says: 
        "The term should be broken into words, according to the server's
definition of a 'word'" 
but the CQL tutorial says something else and I'm afraid that people
writing client code will assume it's part of the default CQL semantics
and not (as I understand now) implementation (i.e. profile) dependent.
Perhaps that might be mentioned in the tutorial and CQL language
description?

> The way the term should be treated is specified by a "structure
attribute" -- that is, a special relation
> modifier that, although it is physically attached to the relation
actually talks about the structure of the term
> that the relation relates. One of these is cql.word.
> The phrasing around these modifiers is a little vague, so it's not
made explicit that this also applies in the
> case of the default structure, cql.masked; but I believe that this is
what people intended.
I'd like to have that documented as well if possible to prevent more
confusion in the future. My analysis of the
description:
a. operator = with a multi-word term (word separation implementation
dependent, should be described in the implementation profile) as well as
cql.all and cql.any operators -> default modifier is cql.word
b. operator cql.exact -> default modifier is cql.string. Question: does
this refer to 
   1. exact searching w.r.t. splitting of words (which would imply that
cql.word and cql.string are mutually 
      exclusive), or
   2. exact searching w.r.t. pattern matching (which would imply that
cql.masked and cql.string are mutually 
      exclusive), or
   3. both?
c. operator = with a single term and all other operators -> default
modifier is cql.masked
d. cql.masked implies ??: cql.word or cql.string or none? Maybe this is
orthogonal, i.e., cql.masked can be
   supplied *together* with one of the other five (word, string,
isoDate, number, uri) - assuming b.1. is true.
   But then you'd also need to be able to specify cql.unmasked or
something to disable pattern matching. 
e. only one of word, string, isoDate, number and uri can be set at the
same time for one searchClause
   (unless b.2. is true..)

So: the whole word & pattern matching thing is a bit unclear to me.
Please help me clear this up. If this is intentionally undefined (read:
vague) please say so in the description, so that server & client
implementers know they have to make this definite in their profile
description.

> The upshot is that your server is at liberty to break up multi-word
terms however it likes.
> You should probably use whatever your server already uses.
I'll do that, then..

>> - is word proximity search required even for basic searches like 
>> 'author = "Rembrandt van Rijn"' ?
> The SRW/U "base profile" includes specifications for the minimum level
of CQL support.
--- 8< ---
> So, no, you are not obliged to implement proximity.
But:  
 CQL tutorial, Section 2: 
  In general, multi-word terms are interpreted as requesting records in
which a single field contains all the
  specified words, in the specified order, with no other words in
between. This is a proximity search. 
 CQL context set
(http://www.loc.gov/z3950/agency/zing/cql/context-sets/cql.html):
  = is used for word adjacency, when the term is a list of words. That
is to say that the words 
    appear in that order with no others intervening. 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If this is a mistake or a just possible implementation? Either way,
please clear it up in both documents.

> Since CQL has its own existence independent of SRW, I think these
base-profile requirements ought perhaps to be
> in a separate CQL base-profile document instead of, or as well as, in
the SRW base profile.
Maybe the problem (for me) is that the SRW Base profile doesn't mention
relation modifiers, default modifiers and modifier behaviour at all. A
text like 'semantics of (default) relation modifiers is implementation
dependent and should be further defined in a profile by server
implementors' would do the trick for me..

>> I will pass on your XPath-sorting question
Just as a reminder, here it is again:
> why is sorting defined on XPaths? This completely bypasses any
possible built-in sorting capability of any
> database server except one delivering XML records natively. What I
expect to have to do is:  
> + retrieve every record
> + convert them to adlibXML
> + translate to the right schema using an XSL 
> + call the XPath on each record 
> + sort on the outcome if this XPath
> + print the records in the resulting order (maybe even convert them
again if the retrieveSchema is different) 
> .. pretty slow! Would it be possible within the SRW standard to just
supply an indexname (which in Adlib as well
> as in relational databases systems is searchable as well as
sortable..) in stead of the XPath?

>> I see that Marc has already answered your questions about open source
clients.
Did he? I didn't see that e-mail come by yet.. My question was:
> is there an Open Source SRU/SRW tester or a client that can be used to
test our implementation? I only found
> servers on the SRW website.

>> I will leave Rob to answer the ZeeRex questions
Just as a reminder, here they are again:
> the ZeeRex documentation is a bit concise on configInfo. What settings
exactly are 'setting', 'default' and 
> 'supports'? And why would you want to have a default stylesheet? That
disables XML retrieval altogether, or do I
> misinterpret something?

Many thanks in advance for your time & answers!

Best regards,

Hedzer Westra, Systems Developer

Adlib | Information Systems
Reactorweg 291
3542 AD Utrecht
Postbus 1436
3600 BK Maarssen
tel: +31-30-241 1885
www: http://www.adlibsoft.com

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

July 2017
October 2016
July 2016
August 2014
February 2014
December 2013
November 2013
October 2013
February 2013
January 2013
October 2012
August 2012
April 2012
January 2012
October 2011
May 2011
April 2011
November 2010
October 2010
September 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
October 2009
September 2009
August 2009
July 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager