Hello all,
attached is the preliminary version of the Adlib Base Profile. It is
meant as a starting point for profiles that
Adlib customers create; it defines the software side of things.
Customers can configure the SRU server with the schemas and context sets
they want to use and therefor must extend this base profile.
The intention is that customers write their own profile but also provide
the Adlib Base Profile (or at least refer to it). So, people writing or
using SRU clients should be able to read and understand the Adlib Base
Profile without having any Adlib knowledge.
On some points it differs from what the CQL context set defines, these
differences are (clearly?) pointed out.
I'd appreciate it if you could take some time to read it and check if I
didn't do or write anything stupid...
Also, I'n not a native speaker, nor is our technical writer, so if made
some errors in my English you can correct me if you want to..
Many thanks in advance for your time!
Best regards,
Hedzer Westra, Systems Developer
Adlib | Information Systems
Reactorweg 291
3542 AD Utrecht
Postbus 1436
3600 BK Maarssen
tel: +31-30-241 1885
www: http://www.adlibsoft.com
The Adlib Base Profile version 1.0
and
The Adlib Context Set version 1.0
Monday December 13th, 2004
Hedzer Westra, [log in to unmask]
Adlib Information Systems
http://www.adlibsoft.com
Contents
Introduction
URIs
SRW, SRU and CQL
Requirements
Optional and server-dependent features
Configuration
Implemented SRW/CQL features
Unimplemented optional features
Adlib server-specific implementations
Indexes
Relations
Terms
Modifiers
Sorting
The Adlib Context set
-------------
Introduction
This document describes the Adlib Base Profile 1.0 of the SRU server that is implemented by Adlib Information
Systems in its Internet Server software. It is only a partial profile; it only describes the software
capabilities. For each installation and configuration of this software a full profile document should be
defined.
URIs
The URI for the Adlib Base Profile is "http://www.adlibsoft.com/srw/1.0/".
The Adlib context set is identified by the URI "http://www.adlibsoft.com/cql/1.0/". The preferred identifier
for this URI is 'adlib'.
These are not URLs; do not expect an HTTP response on these locations. They are only meant for
identification purposes of Adlib's implementation of the SRU protocol.
SRW, SRU and CQL
The SRW and SRU protocols, and the CQL language, CQL context set and SRW Base Profile are all described on
http://www.loc.gov/z3950/agency/zing/index.html. Please refer there for information about this search
protocol.
Requirements
The Adlib Internet Server implements all features required by the SRW Base Profile,
refer to http://www.loc.gov/z3950/agency/zing/srw/base-profile.html.
Optional and server-dependent features
SRW/U and CQL are quite broad protocols which allow for many optional and server-dependent features.
This document defines which optional and server-dependent features are implemented, and how these function.
Configuration
Adlib supplies a number of applications to its customers, who can make changes and additions to their
applications. Therefore there is no single 'Adlib' context set or profile. Each customer can configure their
Internet Server to produce the SRU context set and profile that is required. For any such configuration a
context set and profile document should be created. This document describes the Adlib Base Profile and the
Adlib Context Set, which form the basis of such a document. Therefore, only meta-indexes are defined here.
Metadata formats (like Dublin Core or MarcXML) are not defined here.
Implemented SRW/CQL features
- protocol version 1.1
- the SRU protocol, i.e., HTTP GET/POST CGI requests
- explain operation
- searchRetrieve operation
- CQL 1.1 parsing
- CQL 1.1 handling as far as the Base Profile requests it
- CQL context set as far as the Base Profile requests it, plus some extras:
+ and, or, not booleans
+ =, >, <, <=, >=, <> relations
+ exact, all, any, scr relations
+ encloses, within relations
+ cql.anywhere meta-index
+ cql.serverChoice surrogate index
- sorting
- surrogate & non-surrogate diagnostics generation
- recordSchemas
- request echoing, xSortKeys and XCQL
Unimplemented optional features
- recordXPath handling
- result sets
- proximity searches, i.e. the 'prox' boolean operator
- handling of relation, proximity & boolean modifiers
- word anchoring (^)
- matching on a single character using '?'
- scan operation
- SRW (SOAP) as communication layer
Adlib server-specific implementations
Indexes
- The meta-index cql.anywhere searches all indexes defined in the Adlib database at once. It does
not search all indexes in all context sets, as the CQL context set suggests. This might be a slow
search if there are a lot of indexes.
- The adlib.record meta-index searches the whole record. The operator doesn't matter.
This is a slow search since no index can be used.
Relations
- cql.scr is always handled as '='
- The Adlib thesaurus operators 'adlib.generic', 'adlib.broader', 'adlib.narrower', 'adlib.related',
'adlib.topterm' and 'adlib.parents' do thesaurus-enabled searches. These only work correctly on indexes
with thesaurus links defined. Otherwise, they fall back on '=' searching.
- The 'encloses' and 'within' operators are implemented using the Adlib WHEN operator. Some examples:
'term encloses "2000 2004"' translates to 'term >= 2000 WHEN term <= 2004', and
'term within "2001 2005"' translates to 'term > 2001 WHEN term < 2005'.
Two terms must always be supplied (no more, no less), separated by a single space.
Terms
- Searching on *...* is done using the Adlib 'contains' operator, which is slow since no index can be used.
- empty term searches are not supported
- * for pattern matching is only usable at the beginning and/or end of a search term
Modifiers
- boolean and relation modifiers are parsed but not handled. Any search with modifiers will return
an error message.
- there are two types of modifiers: data type modifiers and pattern modifiers.
The data type modifiers are:
cql.string
cql.word
cql.isoDate (not used)
cql.number (not used)
cql.uri (not used)
The pattern modifiers are:
cql.masked
cql.unmasked (not defined in CQL context set)
Of each modifier type only one can be active for each search clause. Some combinations are illegal, e.g.,
cql.masked and cql.number.
Note that none of these modifiers can be used in search queries. They are only used here to describe
default behaviour, and for future extension.
- the implied cql.word/cql.string/cql.masked behaviour is different from what the CQL context set suggests.
Note that the CQL context set is not required by the SRW Base Profile! It is just a suggestion how
CQL searches might be interpreted.
The modifiers cql.word and cql.string can not relate directly to Adlib term or word matching because
this is defined per index by the user; in Adlib each index can be either word or term indexed. If
required, a field can be indexed by term as well as by word. These two indexes can be be reflected using
two separate CQL indexes. It is not possible to use modifiers to switch from one to the other.
cql.word or cql.string semantics as described in the CQL context set are not used. For completeness,
the CQL context suggests the following: when using the operator '=', the number of words in the term is
counted. If it is 1, term matching should be used. If it is more than one, word adjacency matching should
be used.
Adlib interprets terms in the following manner:
+ operator 'exact': implied modifiers are cql.unmasked and either cql.word or cql.string, see =.
+ operator '=': implied modifiers are cql.masked and either cql.word or cql.string, depending on
the index type. This cannot be seen in the explain information but must be described in a profile.
+ operators 'any' and 'all': implied operators are cql.word and cql.masked.
The words are combined using OR (for 'any') or AND (for 'all').
+ adlib.record meta-index: implied operators are cql.string and cql.unmasked.
+ operators 'encloses' and 'within': implied modifier is cql.masked. There is no implied data type
modifier; there must be two terms separated by a single space.
Implied modifier cql.unmasked means:
The CQL pattern match character * has no special meaning; pattern matching is not possible using
this operator. The characters ^, ? and * do not have to be escaped.
Implied modifier cql.masked means:
Pattern matching on * is possible. The characters ^, ? and * must be escaped with \.
Implied modifier cql.word means:
Words are split and then re-combined using the Adlib separators and concatenators rule.
Separator characters are: [];,[email protected]()|{}<>? carriagereturn newline space tab
Concatenator characters are: `-=\./~#$%^&_+:"'*
Please note that the CQL context set says nothing about how words are to be split.
Implied modifier cql.string means:
Terms are not inspected for separators or concatenators.
This implied behaviour will remain intact in future versions, even if modifiers will be supported then.
The differences between the CQL context set are the following:
- word adjacency is a feature that Adlib currently does not implement.
- operator '=' implied modifiers is not dependent on the number of words, but on the index that is used
- operator 'exact' does not imply cql.string, since cql.string or cql.word is index dependent on Adlib.
- the CQL context set is not clear about the implied masking of operator 'exact'. It seems to hint
at using cql.masked. However, cql.unmasked is implied in Adlib's implementation. This way, there
is a clear and usable distinction between '=' and 'exact'.
- operator 'encloses' seems to need only one term in the CQL context set. Adlib needs two.
Sorting
- sorting is only supported for hard-wired (case sensitive) paths, not for full XPaths. The customer can
define a path for each CQL index.
The Adlib Context set
The Adlib context set version 1.0 only defines the six Adlib thesaurus operators.
|