Print

Print


A question was brought up to me at DLF by Corey Keith (LC) and Tom Habing
(UIUC) that the sort spec (and RecordXPath) doesn't have any means for
mapping namespaces.  That got me to thinking about sort in general, and I
want to bring up the following issues for discussion.

1.
   There is no facility for mapping namespaces within an XPath, either for
sort or retrieval.  In a simple schema, this isn't a problem (where all
elements/attributes come from the same namespace) but we already have the
DC schema which doesn't fit this model, let alone more complex schemas
like Collection Description schema and so forth.

The namespaces could be ignored. Instead of /srw_dc:record/dc:title, it's
obvious what /record/title means when applied to the srw dc schema.
However this may not always be the case.  It is perfectly legitimate to
define a schema that has two identically named elements at the same level,
differentiated only by namespace.  For example:
  /a:foo/b:bar  and  /a:foo/c:bar

In SRW, the namespaces on request can be used for the mapping. SRU on the
other hand has no means to carry this information.

Thanks to Tom and Corey for bringing this up.

2.

Although CQL has the concept of multi-dimensional terms, sort does not.
For example, a geographic point has two components, latitude and
longitude.  If a sort was requested for this, the semantics of 'ascending'
are not defined.  Ascending north/south or east/west ? Or ascending just
by number? Or alphabetically?  How do you create an 'ascending' list of
items in any 2+ dimensional space without arbitrarily choosing one axis
as primary?

3.

CaseSensitive is another case where an issue has been recognised, but the
solution is incomplete.  Case normalisation is common, but it's not the
only normalisation that can occur on strings.  I might want to sort after
a stemming algorithm has been applied.  I might want to sort a date field
as a date, rather than alphanumerically. Or vice versa.  Even if a system
can do the transformation into (for example) simple dublin core, there's
no way for it to know that /record/date should be treated differently to
/record/title without prior knowledge of the schema's semantics.  Other
normalisation examples might be to sort with/without leading articles.

With the extensability of CQL in 1.1 as opposed to 1.0 when sort was
designed, I think that for the next version we could do a better job with
the parameter by using CQL as the means of defining the field(s) to sort
by rather than an XPath.

Rob

--
      ,'/:.          Dr Robert Sanderson ([log in to unmask])
    ,'-/::::.        http://www.o-r-g.org/~azaroth/
  ,'--/::(@)::.      Special Collections and Archives, extension 3142
,'---/::::::::::.    Nebmedes:  http://nebmedes.o-r-g.org:8000/
____/:::::::::::::.
I L L U M I N A T I