On Wed, Feb 13, 2002 at 07:44:25AM -0500, LeVan,Ralph wrote:
In general I agree with what Ralph has said, so I am only replying
to areas I don't necessarily agree with.
> > Do people want to be able to take one query and issue it against
> > multiple collections without change?
>
> I don't think so. All our queries are explicitly against a single database.
Personally I think it is useful to be able to take one query and fire
it against multiple servers, as some others have also mentioned.
SRW does not address distribution, but a client (as now in Z39.50)
I think certainly will want to be able to do distribution.
The only ramification of this on CQL is that the approach of profiles
is a good thing. Having consistent naming is good. In one way, I see
this as a chance to "get it right" compared to Z39.50. It is
unfortunate that different Z39.50 servers interpret the same
query in different ways. I would hope with CQL that there would
be a single interpretation of all the query operators. There may
be different mappings made by servers when mapping to Z39.50
attribute lists, but a profile at the CQL level (which may be
sharable with CCL by the way) would be great.
> > - All text to be searched to always be inside quotes. This allows new
> > reserved words to be added later without breaking old queries.
>
> Sorry, but the "easily human enterable" argues against this one. Certainly
> optional, but not mandatory.
The reason I propose this is based on our experience over the past
years using CCL. There may be other good solutions, so I will expand
on the problem and see if anyone else has a good suggested solution.
With CCL, there are a set of reserved words. But the CCL standard I
have is *very* vague about grammar. There is no BNF-style grammar.
It just gives examples. So you can refer to old result sets using S1,
S2 etc. But Z39.50 allows set names to be any string (not just numbers).
So we allowed S=default or S1. But set names can contain spaces,
so we allowed S="this or that", S=default, S1. But then there is
punctuation (comma, parenthsis, etc). So what is an identifier?
I dislike having context senstitive tokenization rules. Its makes
like much easier to implement (well, more options anyway) if tokens
are not context senstive. So, instead of saying 'set name after S=
is any char to next non-whitespace' you say "S" is a reserved word,
set names are a sequence of alphanumeric, for more you have to quote
them.
CCL fails badly in this area because of S1 by the way. If it was
intead S=1 it would have been better. But note, this looks like
a query for a index field called "S" equaling the value "1".
And so on.
My point is really I think a tight grammar is needed.
> > - All reserved words to be upper case only (debatable).
>
> No...
Ok, so reserved words are case insensitive.
A problem we had with CCL was things like later we wanted to
add a new local extension. For example, there was not within
sentence operator in CCL, so we added SAME. But this broke any
old queries that had the word 'same' in them that was not quoted.
Local exensions might not be an issue, but CQL may want to be
extended later by us. In that case, I think its a worthy goal
to ensure that such extensions do not break old queries.
One approach is to only use punctuation for operators (| = or,
& = and, ! = not, +#@ = same sentence, $*# = same paragraph, etc).
But it does not feel very extensible. Another way is to use special
symbols in front of operators. @and @or @not etc.
> > I am not sure if I would want to use the current exact Z39.50
> > attribute sets etc for mapping onto CQL field names. (opinions?)
>
> Absolutely not. What the creators of Index Sets will need to do is describe
> each index in terms of Z39.50 attributes. Then the mapping is available for
> gateways.
This is exactly what we do now with our private CCLInfo explain
category. So I guess I will agree with this idea :-)
To make things compatible with CCL (to get double value out of
new explain categories), it could instead be that dc.title is a
single identifier (so you cannot omit the prefix, but a database
could just define 'title'). That way the same field names could be
used with CCL as well as CQL. Explain would then just list a set
of available index field names - index sets would not be a necessary
concept. Just a thought.
> > Then how to manage the population of field-set names? Should there
> > be a central CQL registry of such names? If it can change per server,
> > then reusing a query against multiple servers seems doomed.
> > Should sites be able to define their own new, local sets without
> > going to the global registry? Instead of 'dc.Title', should it be
> > a URL? That is, dublin core XML namespace URI + DC element name?
> > Or should queries be CQL text plus a set of definitions for mapping
> > "dc" to "Dublin Core URI" etc.
>
> It goes into Explain. You provide the URL in Explain that points the
> user/application to the Index to Attribute Set mapping.
I am wary about putting too much hope in Explain. I think its a bit
of a failure really with Z39.50. To support one query against multiple
targets easily (my goal), then the CQL query itself needs to be
standardized as much as possible (via profiles etc). Explain is useful,
but I think more to understand what subset is supported by a server.
I would rather humans be able to type in a single CQL query and have
it sent to multiple targets verbatim. I dislike the client having
to tweak the query (using explain) for each target. Its going back
to RPN for Z39.50. So Explain is useful, but I think standardization
of index field names should be an encouraged thing independent to
Explain. Explain then does not have to define as much in terms
of semantics - just what is supported.
Alan
|