Eric Lease Morgan wrote:
> Please advise me on how my SRU server should handle multi-word queries.
> Should I:
>
> * do nothing and just throw an error,
Your server should throw an error if it receives invalid requests
including queries.
> * munge these queries into valid CQL on the client side, or
Your client may offer some query language (not CQL) which must then be
converted to CQL on the client side.
> * munge these queries into valid CQL on the server side
Nono.
>
> will do Boolean logic and pattern matching. Whether we like it or not
> people's expectations are being driven by Google.
Sure.
>
> If the above is more or less true, then I expect to get the following
> sorts of queries as input from users, and the majority of the queries
> will be like numbers 1, 2, 3, and 4:
>
> 1. foo
> 2. bar
> 3. foo bar
This is invalid CQL as it is defined today.
> 4. "foo bar"
> 5. title=foo bar
> (Even if Query #3 is not valid CQL, I think it should be, but that is
> beside the point.)
If the purpose of CQL is that it is supposed to be typed-in by regular
users, I agree with you.
OTOH, if CQL is supposed to be an easy-to-use language for applications
(and programmers!) there is no immediate problem as I see it. The focus
is that CQL should be easy to _generate_ (and parse) from input forms,
other command line languages, etc.
For example, a danish user interface might use a different reserved word
for and ("og"), or the user interface might wan to use operators like +
instead of "and".
CQL does not use a TOKEN ALONE to specify whether it is a operator or
search term. Instead it is the context (position) that makes them what
they are.. So
a b c
is treated as term=a, booleanop=b, term=b, whereas
a b
is treated as term=a, booleanop=b, term missing.
You have to convert your Google type language to CQL. Consider this query:
a "b c" d
Tokenize the query. You have 3 tokens.. Put AND or OR between them and
preceed with proper index+relation.. So it could be converted to
dc.title = a AND dc.title = "b c" AND dc.title = d
and you're set. Unfortunately it is not (yet) possible to use:
dc.title = (a AND "b c" AND d)
If CQL had been different queries like
a b
could have been allowed. This is NOT as CQL stands right now and what
I'm going to write has been written before more than a year ago (IIRC)
when CQL was "young". So old-timers can ignore.
1. The operators could be escaped, something like:
a @and b
a <and> b
a \and b
Bad: looks ugly. A programming language - not a user friendly language.
Good: Extremly easy to generate and parse (even for userfriendly
interfaces). For example, it is trivial to convert
a and b
to
a @and b
if the client uses english "reserved" words.
2. Reserved words could be introduced. Names must be quoted to avoid
them being treated as operators .e.g
a "and" b
is OK and is query consisting of three terms, whereas
a and b
is a AND b.
Good: looks good. Close to Google.
Bad: as things progress more words will be introduced.. So people might
be in for "surprises" (some words now suddenly reserved)...
> Another question, "Are Queries #3 and #4 intended to be equivalent?"
Since query #3 is invalid you won't get an answer for me. Note that "a
b" in
dc.title any "a b"
is not a phrase. So it depends on the relation.
When you just write
"a b"
without field+relation it is serverChoice.. which by definition is
undefined.
My favorite method is 1.
> Again, I think the answer is yes, but I am not able not put my finger
> on any documentation explicitly stating this.
>
> My SRU client interface WILL receive queries such as the following --
> people WILL enter queries such as these:
You have to change your interface.
> * foo bar
> * repetitive task
> * virtual libraries
> * International Technology Education Association
>
> My interface needs to gracefully accept such queries, process them, and
> return meaningful results; I do not intend to throw back to the user
> errors such as "Bad syntax. Read the documentation and try again."
Of course not:-)
> My Perl-based CQL parser (beautifully written by Ed S.) is heavily
> based on the cql-java parser. In both cases, queries such as the ones
> above output this error:
>
> unknown first class relation
That's what they should do. Sorry.
>
> Furthermore, a number of the test SRU servers also output errors of
> various flavors for multi-word searches:
>
> Illegal or unsupported boolean found.
> unknown first-class relation
> Query syntax error
At least they all throw some error!
Have deleted some text cuase the same question seem to be raised
multiple times ;-)
> Whew! What do you think? Is the mulit-word query, repetitive task, a
> valid CQL query?
No.
>
> --
> Eric "Why Am I Working At This Time Of Year?!" Morgan
> University Libraries of Notre Dame
Of course:-)
/ Adam
|