I snuck some questions about CQL into a separate mail, but got no response
yet. So I thought I would try a more direct route. What are the goals for
CQL in terms of syntax?
Is a goal to make it reasonably human readable?
Is a goal to make it very internationally friendly (eg: by using numbers
in preference to symbolic identifiers that mean something in english)?
Do people want to be able to use multiple attribute sets in one query?
Do people want to be able to take one query and issue it against
multiple collections without change?
Is a goal to have a direct mapping to Z39.50 constructs?
My personal preferences (influenced no doubt by being English speaking)
is to make it human readable instead of numbers. The field names for
searching on would be symbolic names (not full text) and would relate
I guess to metadata standardss.
Note, there is a CQL page up already under ZiNG, but its pretty terse.
And I would prefer a few things to be different (as always! :-)
To make things concrete, here are some queries:
dc.Title = "Power and Fame"
dc.Title = ("Power" AND "Fame")
dc.Contributor = "LOC" AND dc.Subject = "Standards"
dc.Contributor = "LOC" AND agls.Identifier = "xyzzy"
bib1.Author, dc.Contributor = "Smith"
Basically, I suggest:
- Queries are Unicode text (UTF-8 or whatever).
- All text to be searched to always be inside quotes. This allows new
reserved words to be added later without breaking old queries.
- All reserved words to be upper case only (debatable).
- Fields to be searched to be identified by a two-part identifier
where the first part identifies the scope for the second part.
Eg: dc.title.
What I don't know is how to define a set of scope names (are they
attribute sets? Or just a logical grouping for names? Eg: dublin core
attributes are defined in the Bib-1 attribute set at present)
I am not sure if I would want to use the current exact Z39.50
attribute sets etc for mapping onto CQL field names. (opinions?)
Then how to manage the population of field-set names? Should there
be a central CQL registry of such names? If it can change per server,
then reusing a query against multiple servers seems doomed.
Should sites be able to define their own new, local sets without
going to the global registry? Instead of 'dc.Title', should it be
a URL? That is, dublin core XML namespace URI + DC element name?
Or should queries be CQL text plus a set of definitions for mapping
"dc" to "Dublin Core URI" etc.
The pattern match chars don't seem to follow any existing standards.
(To be more precise, it mixes several existing standards). I would
stick either to CCL (which is the # and ?) and drop '*'. My rationale
is I want to map it to Z39.50 easily. Z39.50 has got a CCL regex
attribute already. I don't mind using a different one - but I think
its important to be able to map the patterns through to some existing
syntax in Z39.50.
Enough to spark off some conversation?
Alan
--
Alan Kent (mailto:[log in to unmask], http://www.mds.rmit.edu.au)
Postal: Multimedia Database Systems, RMIT, GPO Box 2476V, Melbourne 3001.
Where: RMIT MDS, Bld 91, Level 3, 110 Victoria St, Carlton 3053, VIC Australia.
Phone: +61 3 9925 4114 Reception: +61 3 9925 4099 Fax: +61 3 9925 4098
|