LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  May 2002

ZNG May 2002

Subject:

Re: Betr.: Ralph's Premises

From:

Alan Kent <[log in to unmask]>

Reply-To:

Z39.50 Next-Generation Initiative

Date:

Wed, 15 May 2002 12:12:02 +1000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (256 lines)

On Wed, May 15, 2002 at 02:21:38AM +0200, Theo van Veen wrote:
> First I have to say that I appreciate the work being done on CQL and
> Explain. Nevertheless I think that we should make use of some new
> opportunities now we are defining a new query language.

I certainly agree its worth arguing through issues - it is the best
way to solve problems early on rather than later.

> First as a reaction on Ralph:
>
> >So, if I support both dc.title and bath.title and you send me
> >unqualified title, what do you expect me to do?  It just so happens
> >that I specified in my explain record that the default index set was
> >bath, but what if you were expecting it to be dc?
>
> What should the client do in this case? Explain to the user that
> there are different sorts of titles? Or just make an arbitrary choice
> for the user?

As I understand it, you (Theo) want a single concept of "title" in CQL.
(Please correct me if I am wrong!) If there is a single concept of
"title", then you don't need qualifiers.

The problem that others (including me) have expressed is that there is
not a single definition of "title". In Dublin Core there are defined
semantics for "title" (the name of a book etc). However, in another
application "title" might mean "Mr/Mrs/Ms/Dr/Sir" etc. Using prefixes
is therefore proposed to qualify "title" (such as "dc.title") to
disambiguate the meaning of "title".

There *are* multiple solutions to the problem (and using a prefix is
only one of them).

(1) Come up with a global namespace for *all* concepts (without prefixes)
    and the first person to come up with a meaning of "title" gets to use
    that name, and any later meaning that comes along needs to use a
    different name (e.g. "formal_title").

(2) Use explain on a server to work out what a particular server means
    by "title". That is, don't use the name "title" to work out the meaning.
    Instead have some other way of identifying the concept (such as a URI)
    then use explain to work out which index name a particular server
    uses for that concept (so I would look for "http://dc.org/title..."
    and find it was mapped to "title", but "http://human-name.org/title"
    was mapped to "formal_title").

(3) Introduce a prefix so a prefix is allocated to a semantic area where
    names must be unique in that area. Such as "dc" for Dublin Core.
    This could be viewed as a variation of (1) above - the names have just
    got longer "dc.title" instead of just "title". But there is a formalism
    to it - you must be allocated a unique prefix, then your group can
    define names under that group. (This is in effect what XML namespaces
    do by the way - but we are using a short prefix instead of a long
    URI to identify the namespace).

There are lots more variations I am sure.

Now there are also different usages of CQL.

(U1) A person has a single server they talk to all the time, and want to
     express queries using the full capabilities of that server.

(U2) A person wants to write a single query and send it to multiple servers.

I want to support both nicely.

Theo, question 1: do you think I have captured the different alternatives
correctly (with no comment on which is best - I just want to make sure I
am understanding the conceptual model that you want, and the models that
you do not want).

Of the above, I am against proposal (2) because I want to write a
single query that has a chance to work against multiple servers. If I
have to use explain, then I have to rewrite the CQL query per server.

I dislike (1) (a global namespace) because that is against the trend
of what Dublin Core etc are doing. I think its important to be able
to segregate the namespace of indexes.

However, to support usage U1, using qualifiers all the time *is* a pain.
I like being able to define local index names and not have to define
a public table and register a prefix etc. These names are frequently
not intended for cross collection searching. So I like the mix of
being able to define indexs with standard prefix names (with standard
semantics) and local unqualified names for which I can define my own
semantics as best suits the database I am building.

> dc is defined for description and not for searching.

I am sorry, I don't follow your point here. I would have thought that
describing/categorizing data is directly relevant to searching.

> But if it is supposed that a user will have a general
> understanding that dc.author means author, because he has an
> understanding of author, the prefix is not relevant and even
> misleading.

I think what people (including me) are saying is that "author" is
ambiguous unless you come up with a single definition of what "author"
means. Going back to the "title" example above, I think its clear there
is not an intuitive single definition of what "title" means to all
people. It would be a matter of specifying for CQL what "title" or "author"
means. So I disagree with the assertion that a simple index name
such as "title" or "author" is a clear definition of what the semantics
of the index are. I think the Dublin Core activities have demonstrated
this well. The started with 15 core elements, but soon realised that
life is not that simple, and simple names they first came up with
were not enought. So they introduced "qualified Dublin Core" with more
names.

> > > In my point of view not supporting Ralph's premises means
> > > not supporting prefixes. Or did I misunderstood previous
> > > discussions and
> > > is everyone already on this track?
> > Yes, I think you misunderstood.  I believe the consensus is
> > this:
> > 1.There will be some well-know prefixes, e.g., bath and dc, and
> >    you  won't have to use Explain to discover  a server-specific
> >   definition  for these.
>
> In this case a client has to know the prefix exactly. Searching for
> "dc.title:abc or bath.title:abc"  will return an error message if one of
> both is not supported.

Exactly. I think its better to report an error if a query has specified
something that a server does not know than return an incorrect result
because the server has misinterpreted the query due to different semantics.

> > 2.A server is free to define server-specific prefixes (as
> >    long as they don't clash with the well-known prefixes) and you
> >    might have to use explain to discover those.
>
> In distributed searching I do not think any client will search for
> prefixes or indexes that it doesn't know.

Of course. If it does not know the prefix, it cant use it by definition!
But Explain gives a mechanism of learning about prefixes and indexes
the client did not know before. The simplest illustration is a client
that does an Explain query on a server then displays all returned values
to the user in a drop down list. Each index name has a human readable
description along with it. The client application does not "understand"
the different index names in this situation - the human does though.

> > 3.You can send an index name
> >    without a prefix, but in that case the server applies the default
> >    prefix, and you'll need to use explain to find out what that is for
> >    a given server (there won't be any global-default).
>
> This is all I want: reasonable defaults. But I am not able to write
> clients that are intelligent enough to find out whether the servers
> default corresponds to the users expectations.

Ahhhhh! Does this mean then that you are not opposed to prefixes, but
rather all you want to ensure is that a database can be defined without
them. That is not all index names *must* be qualified? I certainly
agree with this. I think a database should be able to support a set
of qualified index names (with standard prefixes) AND a set of unqualified
names.

Is the challenge therefore in your eyes working out what these unqualified
names mean? (Eg: does "title" mean title of a book versus Mr/Mrs/Dr etc).
Is a human readable description enough? Or a URI? Or the Z39.50 attribute
list it binds on to?  Or put another way, what unambiguous way can you
think of that defines what a user expectation is?  This is an important
question to answer.

> > 4.Distributed  searching is theoretically possible, but all indexes
> >    should have well-known prefixes. (Or, you could send non-
> >    prefixed indexes to  different servers but you cannot assume
> >    that they mean the same thing to different servers.)
> > --Ray
>
> What (default) prefixes should be used in distributed searching?

I think Ray's point above is that if you want to write a query and
have it sent to mulitple servers and guarantee those servers use
the same meaning as you intend, that there is no default prefix that
can be used.

If a database can support both qualified (formal, standard definitions)
and unqualified (locally defined) index names, a distributed query using
only unqualified names can still work *if* the query is being sent off
to multiple servers that are known to support the same locally defined
names. I think the argument is that in the case where you want to send
a query of to lots of servers where they do not share the same locally
defined names (because they are locally defined), then using prefixes
avoids a server misinterpreting a query.

> Ralph will return an error message if I try "dc.title:abc or
> bath.title:abc".

Me too. I would never write a query using both though. I would write
a query using only one of them.

But an alternative here is to add a flag when a CQL query is submitted
saying "report error on unknown index names" versus "ignore unknown
index names". (By ignore, I mean return zero matches for that term - sort
of like NULL in relational databases.) I can see the merit in this.
Or even introduce a new symbol or something in CQL indicating for
a index name the behaviour to take (zero matches or error) so the
person writing the query has control - but I think a boolean flag
being sent along with the query is better.

> I have the strong feeling that we are currently on the wrong track.
> We are mixing up Z39.50 attribute sets with dc name spaces, while
> the solution is quite simple: use user understandable names for
> search indexs. It is possible in Dublin Core for description, why is it
> not possible in CQL for searching?

Dublin Core only gives one semantics of "title". I think if you asked
them Dublin Core would agree that their semantics *is not* the only
definition, or even the best. Its just a definition they have agreed
with. This is why they use XML namespaces to qualify their elements
in XML encodings. They do not, for example, claim their interpretation
of "title" is the best one so their one does not need qualification.

So I think we should support qualifiers in part *because* Dublin Core
do it too.

> The abstract Z39.50 attributes were usefull in case of MARC
> descriptions, but in line with Dublin Core  I think we should map the
> Z39.50 search attributes to user understandable names instead of
> sticking to the attributes.

I think we are all in agreement here. We want textual names in queries.
The question regards to unambiguous agreement to what a textual name means.

> Theo

I think there are some very interesting issues to have come out of this.
In summary:

* I think a database should support both prefix qualified index names
  (with globally defined and agreed to semantics) and unqualified
  index names (locally defined semantics).

* For a locally defined index name, how to unambiguosly define its
  semantics? Human description? URI? Z39.50 attribute list?
  ZeeRex records I think would allow a human description and an
  attribute list.

* Should SRW have a flag to be sent in a query to define the behavour
  for unknown index names? (Ignore versus report error versus server
  can do whatever it feels like etc.) I can see the logic in this for
  distributed queries. If SRW picks a single semantic however, I think
  it should be to report an error.

Alan

--
Alan Kent (mailto:[log in to unmask], http://www.mds.rmit.edu.au/~ajk/)
Project: TeraText Technical Director, InQuirion Pty Ltd (www.inquirion.com)
Postal: Multimedia Database Systems, RMIT, GPO Box 2476V, Melbourne 3001.
Where: RMIT MDS, Bld 91, Level 3, 110 Victoria St, Carlton 3053, VIC Australia.
Phone: +61 3 9925 4114  Reception: +61 3 9925 4099  Fax: +61 3 9925 4098

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

July 2017
October 2016
July 2016
August 2014
February 2014
December 2013
November 2013
October 2013
February 2013
January 2013
October 2012
August 2012
April 2012
January 2012
October 2011
May 2011
April 2011
November 2010
October 2010
September 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
October 2009
September 2009
August 2009
July 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager