Some general remarks after reading the other discussions on CQL.
Searchfields being identified by a two-part identifier, where the first part identifies the scope of the second part, contain the potential danger that there will be many scopes and it will become very difficult to make sure that "one search fits all". I have a very strong preference for forcing search terms into a single scope (a default attribute set).
Maybe I see it wrong. Are there examples where it makes sense to distinghuish between title and dc.title in searching?
We will use SRU to query multiple databases and I would like to send the a query unchanged to all these targets. That is the main reason for standardisation and the main reason for us to use SRU.
A very important scenario will be: users with a webbrowser that supports XSL/XML and an HTML page in which the user will enter one query that will be send to multiple servers. The results will be catched in multiple browser windows that do the XSL-transformation but this is monitored by the main search window. We have this scenario working but the main problem is now indeed that different (external) targets require a different query.
NB. This scenario is called the "personal portal" because it does not require any central portal for querying different SRU targets and every user can have his own portal (being just an HTML page and one or more XSL stylesheets) as long as this portal speaks SRU.
The idea that in SRU users must type a query as it is being sent to the targets is not correct. In the html search page all the required processing can be done to convert a user query to CQL. I would like however that CQL and what the user types in are the same or as much alike as possible.
The preferred language for index types is English rather than numbers. We use lots (>100) of index types among which are the most conventional types as title, auther, keyword, subject, TSBN and ISSN. Some of the other index types are specific for specific databases and it does not hurt when an index type that does not exist in all databases is sent to multiple targets. The others just do not give any hits. This seems to be a very attractive approach for most people (users, developers database owners) that are involved in the develeopment of websites for different projects/databases.
I prefer queries like:
title:power and fame (boolean)
title:"power and fame" (phrase)
creator:xyz and subject:standards
author:smith (it is up to the target to convert his to creator:smith)
I think it makes sense to use the Dublin Core fields as index types (so we do not need the dc.prefix). This can be complemented with namespaces from relevant Application Profiles (like the Library Application Profile).
Everybody is free to use exotic index types in as well the query as in his databases but it is obvious that it is in everyone's own interest to conform to a single standard for the conventional index types.
>>> [log in to unmask] 13-02-02 08:02 >>>
I snuck some questions about CQL into a separate mail, but got no response
yet. So I thought I would try a more direct route. What are the goals for
CQL in terms of syntax?
Is a goal to make it reasonably human readable?
Is a goal to make it very internationally friendly (eg: by using numbers
in preference to symbolic identifiers that mean something in english)?
Do people want to be able to use multiple attribute sets in one query?
Do people want to be able to take one query and issue it against
multiple collections without change?
Is a goal to have a direct mapping to Z39.50 constructs?
My personal preferences (influenced no doubt by being English speaking)
is to make it human readable instead of numbers. The field names for
searching on would be symbolic names (not full text) and would relate
I guess to metadata standardss.
Note, there is a CQL page up already under ZiNG, but its pretty terse.
And I would prefer a few things to be different (as always! :-)
To make things concrete, here are some queries:
dc.Title = "Power and Fame"
dc.Title = ("Power" AND "Fame")
dc.Contributor = "LOC" AND dc.Subject = "Standards"
dc.Contributor = "LOC" AND agls.Identifier = "xyzzy"
bib1.Author, dc.Contributor = "Smith"
Basically, I suggest:
- Queries are Unicode text (UTF-8 or whatever).
- All text to be searched to always be inside quotes. This allows new
reserved words to be added later without breaking old queries.
- All reserved words to be upper case only (debatable).
- Fields to be searched to be identified by a two-part identifier
where the first part identifies the scope for the second part.
What I don't know is how to define a set of scope names (are they
attribute sets? Or just a logical grouping for names? Eg: dublin core
attributes are defined in the Bib-1 attribute set at present)
I am not sure if I would want to use the current exact Z39.50
attribute sets etc for mapping onto CQL field names. (opinions?)
Then how to manage the population of field-set names? Should there
be a central CQL registry of such names? If it can change per server,
then reusing a query against multiple servers seems doomed.
Should sites be able to define their own new, local sets without
going to the global registry? Instead of 'dc.Title', should it be
a URL? That is, dublin core XML namespace URI + DC element name?
Or should queries be CQL text plus a set of definitions for mapping
"dc" to "Dublin Core URI" etc.
The pattern match chars don't seem to follow any existing standards.
(To be more precise, it mixes several existing standards). I would
stick either to CCL (which is the # and ?) and drop '*'. My rationale
is I want to map it to Z39.50 easily. Z39.50 has got a CCL regex
attribute already. I don't mind using a different one - but I think
its important to be able to map the patterns through to some existing
syntax in Z39.50.
Enough to spark off some conversation?
Alan Kent (mailto:[log in to unmask], http://www.mds.rmit.edu.au)
Postal: Multimedia Database Systems, RMIT, GPO Box 2476V, Melbourne 3001.
Where: RMIT MDS, Bld 91, Level 3, 110 Victoria St, Carlton 3053, VIC Australia.
Phone: +61 3 9925 4114 Reception: +61 3 9925 4099 Fax: +61 3 9925 4098