I think I have a slightly different take on the bibliographic data/RDBMS
relationship than Karen does, so a few additional points:
At 04:40 PM 7/11/2003 -0800, Bruce D'Arcus wrote:
>Second, would it be reasonable to interpret your comments as meaning:
>
>- it is impossible to map MODS to a RDBMS without any loss
You can map MODS to an RDBMS structure, and while I wouldn't
disagree with Karen that putting MODS into an RDBMS isn't
probably your best choice, it would be for slightly different reasons.
There are actually some good reasons for putting bibliographic
data into a RDBMS. The reason that libraries have authority control files
for author names is to take advantage of normalization; you have
the information regarding each author in one place,
which both saves storage space *and* means you don't have to
hunt through millions of records to locate every occurrence of an
author when their biographical data changes and you have to
update their bib record. In terms of simplifying control over the
bibliographic data and making updates efficient, an RDBMS is
the way to go.
However, 1. bibliographic records are rarely updated; 2a. bibliographic
data tends to organized hierarchically (MARC and MODS both demonstrate
this to some degree, and METS and EAD are more extreme examples);
2b. RDBMS's tend to be *really* slow in handling the nasty join operations
and recursive searches necessary to successfully search bibliographic
data that's been put in a properly normalized relational structure, at
least in comparison with any half-decent text retrieval engine. So,
while an RDBMS makes data management easier for bibliographic
data, it tends to make searching crawl. And people spend way more
time seaching bibliographic data than updating it. This is why, as Karen
pointed out, most library catalog systems which use an RDBMS aren't
*really* breaking out the data into 4th normal form, and employ a variety
of tricks to get decent performance (most of the tricks having the net
effect of making the database act much more like a text retrieval
engine).
So, you can map MODS losslessly into an RDBMS, but if the
main point of the exercise is to make it easy to search, you shouldn't
bother; use an XML-base text retrieval engine. If that's *not* the main
point,
and you're more concerned about data management, than an RDBMS may
be the way to go.
Cautionary Note 1: Those problems which make RDBMS's painfully slow
in searching also make them painfully slow to export data back into
XML format, at least for any metadata which is very hierarchical.
If you're thinking of needing to turn MODS data BibTeXML or some
other markup language, you're better off keeping the MODS data in
XML and using XSLT.
Cautionary Note 2: While RDBMS's give you some advantage in
simplifying data management, those advantages tend not to manifest
themselves for small collections of bibliographic data. If you're talking
about a small, personal bibliographic database (like something you'd
keep in EndNote), a RDBMS is probably a waste of time.
Given all of the above, and your description of your project, I'd seriously
consider just keeping the bibliographic data your storing as a series
of XML records in a single file.
Jerome McDonough
Digital Library Development Team Leader
Elmer Bobst Library, New York University
70 Washington Square South, 8th Floor
New York, NY 10012
[log in to unmask]
(212) 998-2425
|