Hi Patrick,

At 08:10 AM 7/15/2003 -0400, you wrote:
>We've been experimenting with a hybrid XML/SQL approach for our digital
>objects at Brown.

Very interesting approach.  A few questions below:

>We are constructing our metadata as pure XML structures (METS) and then
>populating a repository database (MySQL) using the MODS components from
>these original records.   Our MODS database contains over 20 tables -- most
>of which correspond to higher level MODS nodes such as title and
>origininfo.  Additional tables record the object level data and associate
>an object with a "catalog view" stylesheet, owners, and component files
>(for management purposes).  Once we have loaded data into the tables (part
>of our SIP procedure) we retain the METS record along-side the
>database.  The METS records are needed later in this model.  The only
>direction of interaction is XML to SQL -- we have no need (or desire!) to
>recreate a METS record from these tables.

C'mon, admit it.  You live to write deeply recursive code to convert StructMaps
into RDBMS table structures. :)

First question: once you've exported to the MODS data to the DB, do you
keep it around in the METS record, or do you just retain a descMD/mdRef
element which points to the MODS metadata living in the database?  I'm assuming
the former for performance reasons, but I thought I'd check.

>We next create two inverted indexes from our MODS tables -- one for phrases
>and one for terms.  Both index tables have associated mapping tables to
>link terms/phrases to records and fields within records.  The mappings will
>tell us, for example, if a term or phrase appears in a main title,
>alternative title, uniform title, or a title within a constituent item, so
>we are confident that our searching can be made relatively precise.
>We are using (for now) PHP to manage database interactions and to handle
>the arrays needed to accommodate multi-term searching in a manner more
>efficient that having MySQL perform a series of joins and
>sub-selects.  Once an appropriate record is found in this database
>architecture, the METS record takes over and drives the remainder of the
>user's interaction with the digital object.  The objects table provides the
>linking mechanism between an object and its top-level XSLT viewer.

Can you say a little bit more about how you're managing the multi-term search?
Does your PHP code do a set of single term searches and then perform the
necessary boolean operation on the single term result sets, or is something
else going on?

>As I've alluded to several times, this is an experiment, and we're not sure
>if we'll ultimately go with this model or seek out a purely XML based
>solution.  Since the availability of lost-cost effective XML search tools
>is still somewhat scant, the hybrid approach does have some appeal.

Well, as Karen mentioned, there's Cheshire II; the price is right. :)
It'd definitely be overkill for a personal bibliographic system, but for
a full-blown digital library it's got the performance and flexibility you need.
As one of the people responsible for part of its development, I'll admit
I should've spent more than 15 seconds thinking about the user interface
for its administration, and I should've spent those 15 seconds thinking
something other than "EMACS is the only user interface a real
programmer needs to set up and administer a system."  So, not the
easiest system to configure, but definitely powerful.

Jerome McDonough
Digital Library Development Team Leader
Elmer Bobst Library, New York University
70 Washington Square South, 8th Floor
New York, NY 10012
[log in to unmask]
(212) 998-2425