I'll accept the window into the dictionary of query terms as the basis of
scan.  Accept also that the index terms are simple, e.g. just author or name
or subject etc.  Once this window is opened, what data elements are needed
to comprise the response?  Rob originally proposed in his summary:

totalTerms                      (I don't understand this)
CQL for direct access (index and term)

This window will allow access to database records via the terms retrieved
either using the termValue or using the CQL.  There may be more than one
path (or index) (e.g. authority or bibliographic).  In this case if the CQL
is repeatable then that satisfies it.  Where there is a problem is with
termFrequency.  It is currently only allowing the equivalent of Z39.50
globalOccurrences that gives a single count for the term.  It needs to have
a structure such that it encompasses the occurrence counts for each path (or

Does the response need to include alternativeTerms?  Isn't this making it a
poor man's thesaurus when a proper authority or thesaurus record can be
retrieved by following the CQL?  Also what purpose does totalTerms serve?

<By the way>, authority records are of interest to more than just
cataloguers.  They can provide background information about authors and link
to photographs and biographies.  There is interesting work being done in
France in association with FRANAR that is looking at the requirements for
authority records in libraries, museums and other cultural institutions.
There is also the Australian Literature Gateway that has this two path
model. They are also important in rights management -author's rights,
publisher's rights, etc.  </By the way>


-----Original Message-----
From: Robert Sanderson [mailto:[log in to unmask]]
Sent: Wednesday, 22 January 2003 19:16
To: [log in to unmask]
Subject: Re: Scan

On Wed, 22 Jan 2003, LeVan,Ralph wrote:
> The purpose of Scan is to provide a window into the dictionary of query
> terms for a database.  It was NEVER intended to be used for thesaurus
> though I admit that some tried to use it for such.


When used with authority data, there are a lot of interesting things that
could be done with an extended scan service, but the basic functionality
is to expose the index in cursorable chunks to the client in such a way as
to enable discovery.

As the client [assuming that we agree that we do need to send all of a
searchClause] has already sent a query structure, generating the
equivalent search is trivial by replacing the term in the query with the
term from the returned list.

I see the generic need for a 'displayTerm' to present to end users if the
term is not readable (due to stemming, normalisation, entity substitution
or whatever)

But how to enable the interesting bits?  Currently here's no way to know
if a given index is controlled vocabulary or not.  There's nowhere to put
this information in the protocol, so what about having an entry in explain
which points at the related database?

For example, an lcsh subject index might have something like:
  <related type="authority"></related>

Then we don't burden the PDUs^H^H^H^H SOAP messages with extraneous
information but still alert the client that this data is structured and
there may be interesting things to be done with it.

That said, I still like Jannifer's idea that instead of stepsize there
should be a more structured way to limit the terms returned.  Perhaps a
request parameter that carries structureLevel, which would permit limiting
by the heirarchy, but otherwise still respects the idea that scan is just
a window on the index?

I think that between these two, everything suggested so far can be


> > or direct searching of the used form; if there are related terms,
> > these should be available; if term is a node in a hierarchical
> > structure, it should be possible to navigate that structure.  AND it
> > should provide an indication of how many documents (or, preferably
> > works) a search on the term is likely to retrieve.

> > On step size, I've never used it but I think it is designed primarily
> > for subject browse.  There are better ways to do an expanded and
> > collapsed scan, e.g. by browsing headings that have no subdivisions
> > then allowing them to be "opened" (Windows explorer metaphor).  Step

      ,'/:.          Rob Sanderson ([log in to unmask])
  ,'--/::(@)::.      Special Collections and Archives, extension 3142
,'---/::::::::::.    Twin Cathedrals:  telnet: 7777
____/:::::::::::::.              WWW: