On Tue, May 21, 2002 at 07:51:53AM -0400, LeVan,Ralph wrote:
> > 1. bath.authorWord (can be used with relation(?) and
> > truncation operators)
> >
> > Attribute Type Attribute Values Attribute Names
> > -------------- ---------------- ---------------
> > Use (1) 1003 author
> > Relation (2) 3 equals
> > Position (3) 3 any position in field
> > Structure (4) 2 word
> > Truncation (5) 100 do not truncate
> > Completeness (6) 1 incomplete subfield
>
> I understand that this is much more in line with the explicit intent of the
> Bath Profile folks, but I don't like it. Specifically, I don't want the
> truncation rules to change from index to index.
>
> Besides, I don't believe the intent of the Bath Profile was to prohibit
> truncation, just to define a type of search that did not require it.
>
> Ralph
I agree with you 100%. I want to support right truncation on bath.authorWord.
My mail must not have been clear enough. I will expand with motivation etc.
I want CQL to support the Bath profile.
I want CQL to support the Bath profile without any special rules
built into the definition of CQL that is just for Bath.
I don't want CQL to have Bib-1 knowledge (I want like it to be generic).
To achive this, a bath.authorWord definition as follows is not sufficient:
Attribute Type Attribute Values Attribute Names
-------------- ---------------- ---------------
Use (1) 1003 author
Position (3) 3 any position in field
Structure (4) 2 word
Completeness (6) 1 incomplete subfield
(Note: "Bib-1" should have really be included as a column in the table
above because the type/value pairs should really be attrset/type/value
triples. I have added a new column in all the following tables.)
With this definition, CQL needs to know Bib-1 and/or Bath to insert
in truncation and relation type/values. Otherwise the query
bath.authorWord=smith
will not be a valid Bath query because it is missing the relation and
truncation attribute values. Since I am trying to avoid special Bib-1
and Bath knowledge, I dislike CQL having to have special knowledge
that it must insert missing attribute values. So instead, I proposed
the full definition by used for index names.
Attribute Set Attribute Type Attribute Values Attribute Names
------------- -------------- ---------------- ---------------
Bib-1 Use (1) 1003 author
Bib-1 Relation (2) 3 equals
Bib-1 Position (3) 3 any position in field
Bib-1 Structure (4) 2 word
Bib-1 Truncation (5) 100 do not truncate
Bib-1 Completeness (6) 1 incomplete subfield
If you specify a query such as
bath.authorWord=smith
then to me the '=' symbol means nothing (its just a separator between
the index name and the term to search on). So just grab the full attribute
list above and search on it. This is Bath conformant, and no special
knowledge is required in CQL.
So how to introduce truncation? To avoid Bib-1 knowledge (because there
are truncation attributes defined in other attribute sets such as GEO),
I proposed CQL have the concept of index names and operator names.
Index names are the bath.authorWord etc names. Operator names are
symbolic names that CQL uses to map concepts CQL implements (such as
'?' meaning truncation, '>' meaning greater-than) onto attribute lists.
So I proposed operator definitions to have attribute lists (just like
index names) such as
Operator: Right Truncation
Attribute Set Attribute Type Attribute Values Attribute Names
------------- -------------- ---------------- ---------------
Bib-1 Truncation (5) 1 right-truncation
So a CQL query such as
bath.authorWord=smith?
is turned into an attribute list for the word "smith" by first taking
the attribute list for "bath.authorWord", then adding/overlaying the
attribute list for "operator: right truncation". The add/overlay rules
are if the same attribute-set/type has a value already, replace it
with the new value from the operator. Otherwise append a new triple
to the end of the attribute list. For the above definitions, you end
up with
Attribute Set Attribute Type Attribute Values Attribute Names
------------- -------------- ---------------- ---------------
Bib-1 Use (1) 1003 author
Bib-1 Relation (2) 3 equals
Bib-1 Position (3) 3 any position in field
Bib-1 Structure (4) 2 word
Bib-1 Truncation (5) 1 right-truncation
Bib-1 Completeness (6) 1 incomplete subfield
As a comparison, *if* we define "dc.title" to be just a USE attribute
(that is, the other types are not defined - we are not doing the Bath
thing of mandating all type/values are specified):
Attribute Set Attribute Type Attribute Values Attribute Names
------------- -------------- ---------------- ---------------
Bib-1 Use (1) 4 title
Then the query
dc.title=smith?
would map to the attribute list
Attribute Set Attribute Type Attribute Values Attribute Names
------------- -------------- ---------------- ---------------
Bib-1 Use (1) 4 title
Bib-1 Truncation (5) 1 right-truncation
(Note: I am not proposing what dc.title should be, just using it as
an example of how operators either add or replace attribute values to
form the full attribute list - in this example a new type/value is
added because it was not there before).
The same thing is done for other operators such as '>' etc. In my previous
mail, I proposed that '=' NOT be mapped onto an operator. Instead, it
is just syntactic sugar to separate the index name from the term.
In practice this is not a problem. Most systems default to 'equals'
if that attribute value is not specified. So I consider '=' to mean
nothing special at all. However, other symbols '>', '<', '>=' etc
DO have special meaning. They map onto operator names (greater-than etc).
The 'greater-than' operator would be defined as:
Attribute Set Attribute Type Attribute Values Attribute Names
------------- -------------- ---------------- ---------------
Bib-1 Relation (2) 5 greater-than
So a query such as
dc.title>smith
would map on to
Attribute Set Attribute Type Attribute Values Attribute Names
------------- -------------- ---------------- ---------------
Bib-1 Use (1) 4 title
Bib-1 Relation (2) 5 greater-than
So in my previous mail, you have to look at both the operator definitions
*and* the index definitions for a database (not just the index definitions).
Is this scheme perfect? No. I can come up with weird semantics easily.
But I don't think there is a perfect scheme because Z39.50 itself is
not perfect. But it seems like a sensible sort of compromise. It avoids
Bib-1 knowledge (because all such knowledge is built into the index and
operator definitions - I could replace Bib-1 with GEO in all the above
tables and it would just work). It avoids Bath rules (CQL does not have
to extend index attribute lists to make sure all type/values are included
as required by Bath). And its extensible to support other operators that
may be introduced (for example GEO region-overlaps etc operators).
(I am not trying to propose a syntax for other operators (such as overlaps)
here - I want to defer that orthogonal discussion till later - but I think
other operators are important to support somehow.)
I hope this clears up any confusion.
Alan
|