Print

Print


All,

Here's a thought (apologies in advance if this message is naive, either
from a technical or practical standpoint--it lacks detail and does nothing
to provide specific recommendations--but here goes nothing anyway).

Why not make BIBFRAME about the future of bibliographic data rather than
its past? Libraries have invested tremendous resources over the years to
produce and share their catalog records; it is only natural that they (and
the catalogers who have worked so hard to encode those records--and to
master the arcane set of rules behind their creation) would want to
preserve that investment. Ergo the desire to devise a lossless (or nearly
lossless) crosswalk from MARC to an RDF vocabulary (i.e., BIBFRAME). For
years, libraries have been driven by just-in-case approaches to their
services (certainly in the acquisition of new materials). But when we're
dealing with data, do we really need to follow the same costly pattern?
Rather than spending additional time and resources to attempt the quixotic
task of converting all of MARC into actionable linked data (just in case we
might need access to the contents of some obscure and dubiously useful MARC
field), why not embrace a just-in-time approach to data conversion?

As Karen has pointed out here, MARC records are structured as documents:
much of our access to their contents comes through full-text keyword
searching. Now, we already have a standardized way to encode data-rich
documents: namely, XML. The MARCXML[1] format already gives us a lossless
way to convert our legacy data into an interoperable format. And the W3C
has spent the last 15 years developing standards around XML: XQuery 3.1[2]
and XSLT 3.0[3] are now robust functional programming languages that even
support working with JSON-encoded data. Needless to say, the same kind of
ecosystem is not available for working with binary MARC. Next-generation
Web application platforms like Graphity[4] and Callimachus[5] utilize the
XML stack for conversion routines or as a data integration pipeline into
RDF linked data. The NoSQL (XML) database MarkLogic (which I believe the
Library of Congress itself uses) now includes an integrated triplestore.
Archives-centric tools like Ethan Gruber's xEAC[6] also provide a hybrid
model for leveraging XML to produce linked data (as an aside: leveraging
XML for data integration could promote interoperability between libraries
and archives, which continue to rely heavily on XML document
structures--see EAD3[7]--to encode their data).

So, why not excise everything from BIBFRAME that is mostly a reflection of
MARC and work to remodel the vocabulary according to best practices for
linked data? We can store our legacy MARC data as MARCXML (a lossless
conversion), index it, link it to its BIBFRAME representation, and then
access it on a just-in-time basis, whenever we find we need something that
we didn't think was worth modeling as RDF. This would let BIBFRAME be the
"glue" that it is supposed to be and would allow us to draw on the full
power of XQuery/XSLT/XProc and SPARQL, together, to fit the needs of our
user interfaces. This is still a two-tiered approach, but it does not
include the overhead of trying to pour old wine into new wineskins
(terrible mixed metaphor, but couldn't resist the biblical allusion).

This kind of iterative approach seems more scalable and locally
customizable than trying to develop an exhaustive algorithm that accounts
for every possible permutation present in the sprawling MARC formats.

Similar suggestions to this may have already been made on this list, but I
think it's at least worth reviving the possibility in the context of the
current thread. In short: we could extract the essence from our legacy
bibliographic records, remodel it, and then, from here on out, start
encoding things in new ways, without being beholden to an outmoded standard
and approach. All the old data would still be there, and would be
computationally tractable as XML, but our new data wouldn't need to be
haunted by its ghost.

Tim

[1] http://www.loc.gov/standards/marcxml/
[2] http://www.w3.org/TR/xquery-31/
[3] http://www.w3.org/TR/xslt-30/
[4] http://graphityhq.com/
[5] http://callimachusproject.org/
[6] https://github.com/ewg118/xEAC
[7]
http://www2.archivists.org/groups/technical-subcommittee-on-encoded-archival-description-ead/ead3-gamma-release

--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library