On Tue, Dec 09, 2003 at 01:27:40PM +0000, Robert Sanderson wrote:
> > Finally, exact, any, and all are interesting cases. There is no 'any'
> > in Bib-1 (unless I have an old printout). 'all' I guess is 'word list'.
>
> For any, I turn the clause into a tree of OR clauses in bib1, AND clauses
> for all. Yes, this isn't quite correct, as the client will likely have
> different 'word' extraction to the server. Exact I use 2=3,5=100,6=3.
While debatable, there are advantages. If dealing with an existing Bib-1
system, it won't support the new pattern match form at present. It also
wont support 'any' and 'all'. So it sounds like the best practical solution
available at present is:
For Bib-1:
If a word based term (using 'all' and 'any' not '=' or 'exact')
Split terms based on whitespace
Add first-in-field if '^' at start
Add left/right/both truncation if '*' at start/end/both
Error on anything more complex (could try a regexp format)
else
Leave term as single string and do the above '^' etc rules
end-if
For AA (if trying not to do it better than Bib-1 above):
Wait for any/all to be fixed in the standard
Wait for new CQL regexp to be added (if not added already)
Add CQL regexp attribute if query contains any of '^', '*', '?', and '\'
For Bib-1, there are a couple of regexp formats for Bib-1 that could be
tried, but cross translation of patterns from one format to another can
be tricky to get right and not necessarily portable.
I almost wonder if CQL should say for 'all' and 'any' that terms are
white space separated. If you say 'book-case' and the system indexes it
as two words, then treat as a phase query (adjacent words). This seems
to be what all the existing implementations do, so I wonder if the spec
should be changed to make it compatible.
Alan
|