> Is a goal to make it reasonably human readable?
Yes. More than that, it needs to be reasonable for a human to enter. CQL
is a potential future area of disagreement between SRW and SRU. I think the
SRW community assumes that their client software can manipulate a human
entered query into CQL. The SRU community has no such advantage; their
users must type in a CQL query. As much as possible, we need to make CQL
easy to enter.
> Is a goal to make it very internationally friendly (eg: by
> using numbers
> in preference to symbolic identifiers that mean something in english)?
English is preferable to numbers. Remember, humans will have to type this.
More importantly, non-Z39.50-geeks will have to type this. People who won't
remember the difference between 1016 and 1035. (What was the difference
between them?)
We're going to have a small set of keywords. I'd be please to hear a
suggestion on how we'd internationalize them, but I don't have any
suggestions.
> Do people want to be able to use multiple attribute sets in one query?
Absolutely! But, I've been calling them Index Sets. Multiple Z39.50
attributes will roll up into a single Index ID. Multiple Index ID's roll up
into an Index Set.
> Do people want to be able to take one query and issue it against
> multiple collections without change?
I don't think so. All our queries are explicitly against a single database.
> Is a goal to have a direct mapping to Z39.50 constructs?
I've tried very hard to make this so. I want a trivial gateway between
SRW/SRU and classic Z39.50.
> My personal preferences (influenced no doubt by being English
> speaking)
> is to make it human readable instead of numbers. The field names for
> searching on would be symbolic names (not full text) and would relate
> I guess to metadata standardss.
Yes!
> Note, there is a CQL page up already under ZiNG, but its pretty terse.
> And I would prefer a few things to be different (as always! :-)
>
> To make things concrete, here are some queries:
>
> dc.Title = "Power and Fame"
> dc.Title = ("Power" AND "Fame")
> dc.Contributor = "LOC" AND dc.Subject = "Standards"
> dc.Contributor = "LOC" AND agls.Identifier = "xyzzy"
> bib1.Author, dc.Contributor = "Smith"
>
> Basically, I suggest:
> - Queries are Unicode text (UTF-8 or whatever).
> - All text to be searched to always be inside quotes. This allows new
> reserved words to be added later without breaking old queries.
Sorry, but the "easily human enterable" argues against this one. Certainly
optional, but not mandatory.
> - All reserved words to be upper case only (debatable).
No, not "EHE".
> - Fields to be searched to be identified by a two-part identifier
> where the first part identifies the scope for the second part.
> Eg: dc.title.
Yes. But for EHE, I'd like to propose that a database might have a default
Index Set which would allow the omition of the Index Set pre-qualifier.
> What I don't know is how to define a set of scope names (are they
> attribute sets? Or just a logical grouping for names? Eg: dublin core
> attributes are defined in the Bib-1 attribute set at present)
Index Sets.
> I am not sure if I would want to use the current exact Z39.50
> attribute sets etc for mapping onto CQL field names. (opinions?)
Absolutely not. What the creators of Index Sets will need to do is describe
each index in terms of Z39.50 attributes. Then the mapping is available for
gateways.
> Then how to manage the population of field-set names? Should there
> be a central CQL registry of such names? If it can change per server,
> then reusing a query against multiple servers seems doomed.
> Should sites be able to define their own new, local sets without
> going to the global registry? Instead of 'dc.Title', should it be
> a URL? That is, dublin core XML namespace URI + DC element name?
> Or should queries be CQL text plus a set of definitions for mapping
> "dc" to "Dublin Core URI" etc.
It goes into Explain. You provide the URL in Explain that points the
user/application to the Index to Attribute Set mapping.
> The pattern match chars don't seem to follow any existing standards.
> (To be more precise, it mixes several existing standards). I would
> stick either to CCL (which is the # and ?) and drop '*'. My rationale
> is I want to map it to Z39.50 easily. Z39.50 has got a CCL regex
> attribute already. I don't mind using a different one - but I think
> its important to be able to map the patterns through to some existing
> syntax in Z39.50.
Fine by me.
> Enough to spark off some conversation?
It's a great start! Now we need some implementation and interoperability!
Ralph
|