> Date: Tue, 14 Dec 2004 14:12:40 +0100
> From: Hedzer Westra <[log in to unmask]>
>
> attached is the preliminary version of the Adlib Base Profile.
> [...]
> I'd appreciate it if you could take some time to read it and check
> if I didn't do or write anything stupid...
OK. Thanks for running this past us.
> Also, I'n not a native speaker, nor is our technical writer, so if
> made some errors in my English you can correct me if you want to..
Your English is superb; so is your writer's.
--
> - The meta-index cql.anywhere searches all indexes defined in the
> Adlib database at once. It does not search all indexes in all
> context sets, as the CQL context set suggests. This might be a slow
> search if there are a lot of indexes.
It is at best inadvisable, and probably just wrong, to _re_define the
meaning of an existing index like this -- especially such a core one.
If cql.anywhere doesn't meet your needs, it would be better to define
your own index that does (or ask to have it added to the CQL set if
you think it's of general interest).
> - The adlib.record meta-index searches the whole record. The
> operator doesn't matter. This is a slow search since no index can
> be used.
Perhaps we should consider adding cql.record for whole-record
searching (where supported).
You can't really say "the operator doesn't matter" as this is
overriding established semantics of CQL and the CQL context set. It
would be much better to say "the operator must be '=': all others will
be rejected".
(And by the way, it is conventional in CQL to talk of "relations"
rather then "operators". Unless you have a compelling reason, you
should probably stick to this convention.)
> - The Adlib thesaurus operators 'adlib.generic', 'adlib.broader',
> 'adlib.narrower', 'adlib.related', 'adlib.topterm' and
> 'adlib.parents' do thesaurus-enabled searches. These only work
> correctly on indexes with thesaurus links defined. Otherwise, they
> fall back on '=' searching.
Are these relations or relation modifiers? If you don't already have
this nailed down, I would recommend the latter, as they are all
refinements on the general relation of equality.
There really should be a thesaurus-use context set defined outside of
Adlib, for use in this and other profiles (or the relevant elements
should be added to the existing Zthes context set). We actually
started this process a month or two back, but got sidetracks -- or
maybe mired in excess complexity.
Depending on the urgency of your Adlib work, you might try to restart
that process and use the resulting "official" thesaurus-expansion
support. Otherwise, if you need to push on with the Adlib-specific
approach, I hope you will change this in version x.y of your profile,
when the official version comes out, as this will promote
interoperability between Adlib and other SRU implementations.
> - The 'encloses' and 'within' operators are implemented using the
> Adlib WHEN operator. Some examples:
> 'term encloses "2000 2004"' translates to 'term >= 2000 WHEN term
> <= 2004'
Nope -- "encloses" is the converse of "within", so
term encloses "2000 2004"
translates to
term <= 2000 WHEN term >= 2004
> 'term within "2001 2005"' translates to 'term > 2001 WHEN term <
> 2005'.
My reading of the CQL context set indicates that this relation is
inclusive of endpoints, so you should translate to
term >= 2001 WHEN term <= 2005
> - there are two types of modifiers: data type modifiers and pattern
> modifiers [...]
This whole section belongs in the CQL context-set document.
> The pattern modifiers are:
> cql.masked
> cql.unmasked (not defined in CQL context set)
We should fix that!
> Note that the CQL context set is not required by the SRW Base
> Profile!
That's not really true, as the CQL context set provides some of the
key elements used in pretty much CQL queries, e.g. the meaning of all
the relations. Probably the base profile should make this explicit.
> The modifiers cql.word and cql.string can not relate directly to
> Adlib term or word matching because this is defined per index by
> the user; in Adlib each index can be either word or term
> indexed. If required, a field can be indexed by term as well as by
> word.
[I don't understand this fully, I think because it assumes you know
something about Adlib. If Adlib's "term" searching similar to what we
mean by "string"?]
> These two indexes can be be reflected using two separate CQL
> indexes. It is not possible to use modifiers to switch from one to
> the other.
Why not? It seems an eminently sensible way of expressing the
difference.
> Adlib interprets terms in the following manner:
> + operator 'exact': implied modifiers are cql.unmasked [...]
No, we all agreed that "exact" does _not_ imply unmasked.
> + operator '=': implied modifiers are cql.masked and either cql.word
> or cql.string, depending on the index type. This cannot be seen
> in the explain information but must be described in a profile.
This is _not_ what "=" means in CQL. It means that the term is
word-structured, irrespective of the index being searched, unless
overridden by a relation modifier.
> + operators 'any' and 'all': implied operators are cql.word and cql.masked.
> The words are combined using OR (for 'any') or AND (for 'all').
Yup. This was always the intent and should probably be explicit in
the CQL context-set document.
> + adlib.record meta-index: implied operators are cql.string and
> cql.unmasked.
There is nothing in CQL that allows you to infer different
term-structure and masking semantics from an index name.
> Implied modifier cql.word means:
> Words are split and then re-combined using the Adlib separators and concatenators rule.
> Separator characters are: [];,!@()|{}<>? carriagereturn newline space tab
> Concatenator characters are: `-=\./~#$%^&_+:"'*
> Please note that the CQL context set says nothing about how words are to be split.
... and that therefore what you specify here is a perfectly good
refinement of what the CQL context set says.
> This implied behaviour will remain intact in future versions, even
> if modifiers will be supported then.
Aha! Finally, I spot a tiny, tiny error in the English :-) That
sentence should say "... even if modifiers ARE supported then".
> - operator 'exact' does not imply cql.string, since cql.string or
> cql.word is index dependent on Adlib.
The correct way to handle this is to have a single CQL index be mapped
to either one of two different underlying Adlib indexes, dependent on
whether string or word structure is used.
I think that's everything. Despite my having complained about so many
things, I think this is really nice work, and the document is very
clear about what it's saying.
_/|_ _______________________________________________________________
/o ) \/ Mike Taylor <[log in to unmask]> http://www.miketaylor.org.uk
)_v__/\ "In the Sixties people took acid to make the world weird.
Now the world is weird, people take Prozac to make it normal"
-- Damon Albarn.
--
Listen to free demos of soundtrack music for film, TV and radio
http://www.pipedreaming.org.uk/soundtrack/
|