At 08:10 AM 7/15/2003 -0400, you wrote:
>We've been experimenting with a hybrid XML/SQL approach for our digital
>objects at Brown.
Very interesting approach. A few questions below:
>We are constructing our metadata as pure XML structures (METS) and then
>populating a repository database (MySQL) using the MODS components from
>these original records. Our MODS database contains over 20 tables -- most
>of which correspond to higher level MODS nodes such as title and
>origininfo. Additional tables record the object level data and associate
>an object with a "catalog view" stylesheet, owners, and component files
>(for management purposes). Once we have loaded data into the tables (part
>of our SIP procedure) we retain the METS record along-side the
>database. The METS records are needed later in this model. The only
>direction of interaction is XML to SQL -- we have no need (or desire!) to
>recreate a METS record from these tables.
C'mon, admit it. You live to write deeply recursive code to convert StructMaps
into RDBMS table structures. :)
First question: once you've exported to the MODS data to the DB, do you
keep it around in the METS record, or do you just retain a descMD/mdRef
element which points to the MODS metadata living in the database? I'm assuming
the former for performance reasons, but I thought I'd check.
>We next create two inverted indexes from our MODS tables -- one for phrases
>and one for terms. Both index tables have associated mapping tables to
>link terms/phrases to records and fields within records. The mappings will
>tell us, for example, if a term or phrase appears in a main title,
>alternative title, uniform title, or a title within a constituent item, so
>we are confident that our searching can be made relatively precise.
>We are using (for now) PHP to manage database interactions and to handle
>the arrays needed to accommodate multi-term searching in a manner more
>efficient that having MySQL perform a series of joins and
>sub-selects. Once an appropriate record is found in this database
>architecture, the METS record takes over and drives the remainder of the
>user's interaction with the digital object. The objects table provides the
>linking mechanism between an object and its top-level XSLT viewer.
Can you say a little bit more about how you're managing the multi-term search?
Does your PHP code do a set of single term searches and then perform the
necessary boolean operation on the single term result sets, or is something
else going on?
>As I've alluded to several times, this is an experiment, and we're not sure
>if we'll ultimately go with this model or seek out a purely XML based
>solution. Since the availability of lost-cost effective XML search tools
>is still somewhat scant, the hybrid approach does have some appeal.
Well, as Karen mentioned, there's Cheshire II; the price is right. :)
It'd definitely be overkill for a personal bibliographic system, but for
a full-blown digital library it's got the performance and flexibility you need.
As one of the people responsible for part of its development, I'll admit
I should've spent more than 15 seconds thinking about the user interface
for its administration, and I should've spent those 15 seconds thinking
something other than "EMACS is the only user interface a real
programmer needs to set up and administer a system." So, not the
easiest system to configure, but definitely powerful.
Digital Library Development Team Leader
Elmer Bobst Library, New York University
70 Washington Square South, 8th Floor
New York, NY 10012
[log in to unmask]