LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  May 2002

ZNG May 2002

Subject:

Re: revised Bath/CQL searches

From:

Reply-To:

Z39.50 Next-Generation Initiative

Date:

Wed, 22 May 2002 11:20:56 +1000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (178 lines)

On Tue, May 21, 2002 at 07:51:53AM -0400, LeVan,Ralph wrote:
> > 1. bath.authorWord (can be used with relation(?) and
> > truncation operators)
> >
> >     Attribute Type    Attribute Values    Attribute Names
> >     --------------    ----------------    ---------------
> >     Use (1)           1003                author
> >     Relation (2)      3                   equals
> >     Position (3)      3                   any position in field
> >     Structure (4)     2                   word
> >     Truncation (5)    100                 do not truncate
> >     Completeness (6)  1                   incomplete subfield
>
> I understand that this is much more in line with the explicit intent of the
> Bath Profile folks, but I don't like it.  Specifically, I don't want the
> truncation rules to change from index to index.
>
> Besides, I don't believe the intent of the Bath Profile was to prohibit
> truncation, just to define a type of search that did not require it.
>
> Ralph

I agree with you 100%. I want to support right truncation on bath.authorWord.
My mail must not have been clear enough. I will expand with motivation etc.

I want CQL to support the Bath profile.
I want CQL to support the Bath profile without any special rules
built into the definition of CQL that is just for Bath.
I don't want CQL to have Bib-1 knowledge (I want like it to be generic).

To achive this, a bath.authorWord definition as follows is not sufficient:

    Attribute Type    Attribute Values    Attribute Names
    --------------    ----------------    ---------------
    Use (1)           1003                author
    Position (3)      3                   any position in field
    Structure (4)     2                   word
    Completeness (6)  1                   incomplete subfield

(Note: "Bib-1" should have really be included as a column in the table
above because the type/value pairs should really be attrset/type/value
triples. I have added a new column in all the following tables.)

With this definition, CQL needs to know Bib-1 and/or Bath to insert
in truncation and relation type/values. Otherwise the query

    bath.authorWord=smith

will not be a valid Bath query because it is missing the relation and
truncation attribute values. Since I am trying to avoid special Bib-1
and Bath knowledge, I dislike CQL having to have special knowledge
that it must insert missing attribute values. So instead, I proposed
the full definition by used for index names.

    Attribute Set   Attribute Type    Attribute Values    Attribute Names
    -------------   --------------    ----------------    ---------------
    Bib-1           Use (1)           1003                author
    Bib-1           Relation (2)      3                   equals
    Bib-1           Position (3)      3                   any position in field
    Bib-1           Structure (4)     2                   word
    Bib-1           Truncation (5)    100                 do not truncate
    Bib-1           Completeness (6)  1                   incomplete subfield

If you specify a query such as

    bath.authorWord=smith

then to me the '=' symbol means nothing (its just a separator between
the index name and the term to search on). So just grab the full attribute
list above and search on it. This is Bath conformant, and no special
knowledge is required in CQL.

So how to introduce truncation? To avoid Bib-1 knowledge (because there
are truncation attributes defined in other attribute sets such as GEO),
I proposed CQL have the concept of index names and operator names.
Index names are the bath.authorWord etc names. Operator names are
symbolic names that CQL uses to map concepts CQL implements (such as
'?' meaning truncation, '>' meaning greater-than) onto attribute lists.

So I proposed operator definitions to have attribute lists (just like
index names) such as

    Operator: Right Truncation

    Attribute Set   Attribute Type    Attribute Values    Attribute Names
    -------------   --------------    ----------------    ---------------
    Bib-1           Truncation (5)    1                   right-truncation

So a CQL query such as

    bath.authorWord=smith?

is turned into an attribute list for the word "smith" by first taking
the attribute list for "bath.authorWord", then adding/overlaying the
attribute list for "operator: right truncation". The add/overlay rules
are if the same attribute-set/type has a value already, replace it
with the new value from the operator. Otherwise append a new triple
to the end of the attribute list. For the above definitions, you end
up with

    Attribute Set   Attribute Type    Attribute Values    Attribute Names
    -------------   --------------    ----------------    ---------------
    Bib-1           Use (1)           1003                author
    Bib-1           Relation (2)      3                   equals
    Bib-1           Position (3)      3                   any position in field
    Bib-1           Structure (4)     2                   word
    Bib-1           Truncation (5)    1                   right-truncation
    Bib-1           Completeness (6)  1                   incomplete subfield

As a comparison, *if* we define "dc.title" to be just a USE attribute
(that is, the other types are not defined - we are not doing the Bath
thing of mandating all type/values are specified):

    Attribute Set   Attribute Type    Attribute Values    Attribute Names
    -------------   --------------    ----------------    ---------------
    Bib-1           Use (1)           4                   title

Then the query

    dc.title=smith?

would map to the attribute list

    Attribute Set   Attribute Type    Attribute Values    Attribute Names
    -------------   --------------    ----------------    ---------------
    Bib-1           Use (1)           4                   title
    Bib-1           Truncation (5)    1                   right-truncation

(Note: I am not proposing what dc.title should be, just using it as
an example of how operators either add or replace attribute values to
form the full attribute list - in this example a new type/value is
added because it was not there before).

The same thing is done for other operators such as '>' etc. In my previous
mail, I proposed that '=' NOT be mapped onto an operator. Instead, it
is just syntactic sugar to separate the index name from the term.
In practice this is not a problem. Most systems default to 'equals'
if that attribute value is not specified. So I consider '=' to mean
nothing special at all. However, other symbols '>', '<', '>=' etc
DO have special meaning. They map onto operator names (greater-than etc).
The 'greater-than' operator would be defined as:

    Attribute Set   Attribute Type    Attribute Values    Attribute Names
    -------------   --------------    ----------------    ---------------
    Bib-1           Relation (2)      5                   greater-than

So a query such as

    dc.title>smith

would map on to

    Attribute Set   Attribute Type    Attribute Values    Attribute Names
    -------------   --------------    ----------------    ---------------
    Bib-1           Use (1)           4                   title
    Bib-1           Relation (2)      5                   greater-than

So in my previous mail, you have to look at both the operator definitions
*and* the index definitions for a database (not just the index definitions).


Is this scheme perfect? No. I can come up with weird semantics easily.
But I don't think there is a perfect scheme because Z39.50 itself is
not perfect. But it seems like a sensible sort of compromise. It avoids
Bib-1 knowledge (because all such knowledge is built into the index and
operator definitions - I could replace Bib-1 with GEO in all the above
tables and it would just work). It avoids Bath rules (CQL does not have
to extend index attribute lists to make sure all type/values are included
as required by Bath). And its extensible to support other operators that
may be introduced (for example GEO region-overlaps etc operators).
(I am not trying to propose a syntax for other operators (such as overlaps)
here - I want to defer that orthogonal discussion till later - but I think
other operators are important to support somehow.)

I hope this clears up any confusion.

Alan

Top of Message | Previous Page | Permalink

Advanced Options


Options

Error during command authentication.

Error - unable to initiate communication with LISTSERV (errno=111). The server is probably not started.

Log In

Log In

Get Password

Get Password


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager