The Adlib Base Profile version 1.0 and The Adlib Context Set version 1.0 Beta-2 Initiated: Monday December 13th, 2004 Last updated: Friday December 17th, 2004 Hedzer Westra, [log in to unmask] Adlib Information Systems http://www.adlibsoft.com Contents Introduction URIs SRW, SRU and CQL Requirements Optional and server-dependent features Configuration Implemented SRW/CQL features Unimplemented optional features Adlib server-specific implementations Indexes Relations Terms Modifiers Sorting The Adlib Context set CQL to Adlib bridging ------------- Introduction This document describes the Adlib Base Profile 1.0 of the SRU server that is implemented by Adlib Information Systems in its Internet Server software. It is only a partial profile; it only describes the software capabilities. For each installation and configuration of this software a full profile document should be defined. URIs The URI for the Adlib Base Profile is "info:srw/profile/6/1.0". The Adlib context set is identified by the URI "info:srw/cql-context-set/6/1.0". The preferred identifier for this URI is 'adlib'. SRW, SRU and CQL The SRW and SRU protocols, and the CQL language, CQL context set and SRW Base Profile are all described on http://www.loc.gov/z3950/agency/zing/index.html. Please refer there for information about this search & retrieval protocol. Requirements The Adlib Internet Server implements all features required by the SRW Base Profile, refer to http://www.loc.gov/z3950/agency/zing/srw/base-profile.html. Optional and server-dependent features SRW/U and CQL are quite broad protocols which allow for many optional and server-dependent features. This document defines which optional and server-dependent features are implemented, and how these function. Configuration Adlib supplies a number of applications to its customers, who can make changes and additions to their applications. Therefore there is no single 'Adlib' context set or profile. Each customer can configure their Internet Server to produce the SRU context sets and profiles that are required. For any such configuration a context set and profile document should be created - or existing ones should be configured. This document describes the Adlib Base Profile and the Adlib Context Set, which form the basis of such a customer-defined document. Therefore, only meta-indexes are defined here. Metadata formats (like Dublin Core or MarcXML) are not defined here. Implemented SRW/CQL features - protocol version 1.1 - the SRU protocol, i.e., HTTP GET/POST CGI requests This is a slight extension of SRU, since the current documentation does not mention SRU POST. - explain operation - searchRetrieve operation - CQL 1.1 parsing - CQL 1.1 handling - CQL context set as far as the Base Profile requests it, plus some extras: + and, or, not booleans + prox boolean with an Adlib-specific implementation, see below + =, >, <, <=, >=, <> relations + exact, all, any, scr relations + within relation + cql.anywhere, adlib.allIndexes and adlib.record meta-indexes + cql.serverChoice surrogate index - sorting - surrogate & non-surrogate diagnostics generation - recordSchemas - request echoing, xSortKeys and XCQL Unimplemented optional features - recordXPath handling - result sets - full proximity searches, i.e. the 'prox' boolean - word anchoring (^) - matching on a single character using '?' - scan operation - SRW (SOAP) as communication layer Adlib server-specific implementations Indexes - The adlib.allIndexes meta-index searches all indexes defined in the Adlib database at once. This is different from cql.anywhere, which searches all indexes in all context sets. - The adlib.record meta-index searches the whole record, so all of the fields in each record. This includes data that is not indexed (and possibly not even displayed in any record schema) and therefore not searchable using CQL indexes. The relation must be '='. Note: In future versions, CQL might support a cql.record meta-index with the same semantics. Relations - cql.scr is always handled as '=' - The 'within' relation is implemented using range searching. Exactly two words must be supplied, separated by a single space. The range search type can be selected by using an adlib.range modifier, with the following values: leftexclusive rightexclusive exclusive inclusive (default) Example: 'date within/adlib.range=leftexclusive "2000 2004"' Note: the CQL context set always uses inclusive range searching; there is no range modifier. Terms - empty term searches are not supported - * for pattern matching is only usable at the beginning and/or end of a search term Modifiers - the prox boolean is accepted only with modifier /<=/0/adlib.record/unordered or /<=/-1/adlib.record/unordered. These are all defaults, except the unit value: adlib.record, and the distance of the second example. Other unit and distance values are invalid. 'prox///adlib.record' and 'prox//-1/adlib.record' are the shortest ways to select this option. The left and right arguments of the prox boolean must be simple searches. See the Adlib section for more explanation. - Thesaurus-enabled searches can be executed by issuing an adlib.thesaurus modifier with one of the following values: generic broader narrower related topterm parents These only work correctly on indexes with thesaurus links defined. Otherwise, they fall back on normal searching. The modifier is supported only for the 'exact' and '=' relations. - there are two types of CQL context set modifiers: data type modifiers and pattern modifiers. The accepted data type modifiers are: cql.string cql.word cql.isoDate cql.number Note: cql.uri is invalid The accepted pattern modifiers are: cql.masked cql.unmasked (not yet defined in CQL context set) Of each modifier type only one can be active for each search clause. The combinations /cql.masked/cql.number and /cql.masked/cql.isoDate are invalid. - Adlib interprets terms in the following manner: Modifier cql.masked is always assumed (unless cql.number or cql.isoDate are supplied). + 'exact': implied modifier is cql.string + '=', 'any' and 'all': implied modifier is cql.word For 'any' and 'all' the words are combined using OR (for 'any') or AND (for 'all'). + 'within': there is no implied data type modifier; there must be two words separated by a single space. Modifier cql.unmasked means: The CQL pattern match character * has no special meaning; pattern matching is not possible using this modifier. The characters ^, ? and * do not have to be escaped. Modifier cql.masked means: Pattern matching on * is possible. The characters ^, ?, * and \ must be escaped with \. Modifier cql.word means: Words are split and then re-combined using the Adlib separators and concatenators rule. Word adjacency is not used when searching. An error will be returned when searching with cql.word on a string index. Separator characters are: [];,!@()|{}<>? carriagereturn newline space tab Concatenator characters are: `-=\./~#$%^&_+:"'* Note: the CQL context set says nothing about how words are to be split but instead leaves that up to implementations to be specified. Modifier cql.string means: Terms are not inspected for separators or concatenators. An error will be returned when searching with cql.string on a word index. The difference(s) between the CQL context set is/are the following: - word adjacency is a feature that Adlib currently does not implement. Sorting - sorting is only supported for hard-wired (case sensitive) paths, not for full XPaths. The customer can define a path for each CQL index. The Adlib Context set The Adlib context set version 1.0 defines: (meta-)indexes: - adlib.record (whether this will be added to the CQL context set is still unclear, so the prefix is adlib, not cql) - adlib.allIndexes modifiers: - adlib.thesaurus its six accepted values (generic, broader, etc.) - adlib.range for the within relation - adlib.record for the prox boolean - cql.unmasked (it is assumed this modifier will eventually be added to the CQL context set, hence the cql prefix) CQL to Adlib bridging considerations prox: The special-case prox relation is implemented using the WHEN and WHEN NOT booleans in Adlib. Unit 0 selects WHEN, unit -1 will select WHEN NOT. In Adlib, the WHEN (NOT) booleans on two distinct indexes will first search records using the left operand index. For each record the right operand operator and value will be checked in the same occurrence. The Adlib-specific property of occurrences comes down to the following: each field can have 0 or more values (unless explicitly stated as non-repeatable), whereas conventional RDBMS's can only hold 0 or 1 values in a field. Indexes can be specified to index either the first or all occurrences. The WHEN operator explicitly checks matching occurrences. See section 6.3.10 in the Adlib User Guide (available from http://www.adlibsoft.com/) for more information. within: This relation is implemented using range searching, i.e. the Adlib WHEN boolean used on two identical indexes. cql.word and cql.string: Customers creating their own profiles based on this one should clearly state which indexes support word, string or both. Searches which map to non-existing word or string indexes will return an error, so clients should be careful to use 'exact' or '=' only when supported. Unfortunately this information cannot be sent using the explain ZeeRex record and must therefore be documented in a profile. Performance: - The adlib.allIndexes and cql.anywhere meta-indexes might have slow search responses if there are a lot of indexes. - The adlib.record meta-index cannot use any index and will always be slow. - Searching on *...* is done using the Adlib 'contains' operator, which is slow since no index can be used.