As I watch the traffic on the listserv regarding XSL and the EAD
cookbook, I am increasingly concerned that we are losing sight of what
EAD could provide to the broader archival community as well as
individual repositories.
Before I start ranting about anything I would like to say that Michael
provided a marvelous starting point with the EAD cookbook, giving
assistance to get over a technological hurdle. I doubt he would disagree
if I say that his work is a beginning and not the end.
Ideally, EAD should be a means to provide structural and semantic
markup of archival description. It has always concerned me that it is
a loosely structured set of markup, trying to accomodate everyone's
idea of the way description should be *presented*. And even as there
are descriptions of what belongs within which tags, the intrepretation
across archival repositories varies. As a technologist, whose role it
has been to manipulate markup, EAD's highly lax structure has made it
more difficult to mine what could be a very rich descriptive
structure. In fact, I would argue that its laxness has actually
confounded people's ability to modify it to their own descriptive needs
by inhibiting the very commonalities that it was developed to promote.
Whatever you care to say about MARC/AACR2, you know what
you are getting when you retrieve the 245 field.
I don't believe that the archival community will ever be able to fully
capitalize on the power of SGML/XML unless it can come to some more common
and broadly held understandings of the nature of archival description. No
matter how much markup is inserted into a descriptive document, the
potential to fully exploit the markup will be limited without that common
understanding. In addition, in some of the discussions of description to
which I have been privy, I have heard a lack of distinction between what
is commonly held to be the important elements of description and their
final *presentation*. This has led to unfruitful arguments about
description. It often seems from the perspective of this programmer that
some of the arguments that led to a lax DTD have really been about what
the presentation product (ie formatting) looks like rather than
fundamental descriptive practice. You can make anything look like a
"table" using XSLT - it may be more useful to capture the "meaning" of the
information rather than its format. Then it can be shared across
repositories.
One of the most difficult hurdles in understanding the power of using
XML as a document markup tool is that we can largely separate content
from presentation/formatting. It was certainly a hurdle for
me. Absorbing the idea that I could take information that was ordered
one way in a document and rearrange it for a variety of displays, that
I needn't worry about what was bold or italicized (that I should
instead worry what the information was "about"), took a while.
With the advent of numerous tools such as XSL(T) to manipulate XML,
some of the laxness of the DTD that was built in to accomodate widely
varyiny *formatting* practices is now irrelevant. From a single source
document one can generate multiple versions of a finding aid. Indeed,
one can rearrange the information contained within an EAD document in
any order including putting the eadheader information at the very
bottom of the document if one so desires. Allowing a loose structure
actually confounds our ability to share documents across
repositories. And without certain structural and markup commonalities
it is more difficult to build commonly shared processing tools,
including things such as stylesheets because of the infinite
variations of the original documents. Were the descriptive and markup
practices more constrained, building these tools with good user
interfaces would be greatly simplified - therby obviating the need for
every archivist to learn the ins and outs of XSLT.
With the development of manipulative tools, we could accomodate vastly
different presentation styles (if we desire that) while sharing a
common, consistent descriptive and encoding practice. Common encoding
and description would also allow us to build search tools that can
take full advantage of the rich information contained within finding
aids across collections.
On the other hand, this leads me to another observation. With increasing
concern I have seen people writing their finding aids to accomodate
Michael's stylesheets because they don't have the ability to modify
them. I doubt that was his intent. And in fact, in at least one query
that I have seen, it has led to what is called, in other SGML/XML
communities, "tag abuse". This is the inappropriate use of
tags(elements) in order to meet formatting or stylistic needs rather
than encoding the meaning/semantics/structure of the document. If
people start encoding their container lists so that they will look
nice when using the cookbook's stylesheets, they have missed one o of
the most important opportunities of encoding the finding aids in EAD
in the first place - that is to reflect the intellectual structure and
hierarchy of the collection. If one's only purpose is to make a "good
looking" finding aid for the web, one might as well skip the arduous
process of encoding it in EAD and encode it in HTML.
But clearly this misses the opportunity of EAD. XML can allow us to
share description across collections. But it can also allow us, in
individual repositories, to create single source documents, which,
through manipulations such as an XSLT transformation to HTML (and
XSL/FO to PDF), can provide multiple views of the the same
information.
Indeed, were we to agree on some common descriptive/encoding practices we
could build EAD specific tools, shared across repositories that would
enable us to automatically generate MARC records, reading room
versions of finding aids and a variety of other versions. These tools
would simplify the management of description rather than make it more
onerous. I currently see archives reproducing their their descriptive
information in a variety of forms.
Indeed, I would argue that what the archival community should focus on
is developing a common markup practice based on a common rich
descriptive practice. If repositories hold a common understanding of
the content of the elements and could agree on a common markup
practice the machine manipulation of the documents would be greatly
simplified -indeed almost trivial. Tools that can be adapted, rather
than blindly implemented would be easier to build on a common set of
markup practices. Each repository could display that information in
its own unique way but rely on the common tools for things such as
MARC transformations, searching across collections of finding aids,
and to provide adaptable templates for display.
I take to heart Bill's concern that we really don't understand what
information is useful to our users. However, I would argue the
opposite - that XSLT and other XML manipulation tools provide an
incredible opportunity to discover precisely what we do not know about
users. A good user study might take a richly encoded description of
collections and display the same information in a variety of ways. An
analysis of what patrons find most useful would lead to a better
understanding of descriptive practice and presentation of
information. So, in fact, XSL provides a wonderful opportunity in this
arena.
Finally, as someone who has worked with SGML/XML for several years on
the programming end of things and someone who has trained many folks
to encode finding aids, I have long been interested in building a
suite of tools that would be EAD specific. They would make things such
as creating and editing EAD instances and modifying XSLT stylesheets
and XSL/FO more transparent and simpler for archivists who need to
focus on describing collections rather than encoding their
decriptions. I am not convinced that every archivist needs to
understand all the complexities of encoding documents in hte longer
term. Dynamic web forms, GUI interfaces could be created that would
enable the simplification of the process. Any effort to do this at
this point will be respository specific because consistent encoding
practices are needed in order to simply build such tools. There is not
doubt that to effectively share tools across repositories would
require that some idiosyncratic descriptive practices be retired. But
that does not mean that we have to give up on idiosyncratic display
and presentation!
I, and others who have been thinking about these issues, have
hesitated. We can build tools that meet our institutions' practices
but they will be of little use to the larger community, if our own
practices are idiocyncratic. And they require significant effort. The
payoff would be much greater to everyone if we were assured that our
tools would not be built on shifting sands. Building such tools would
be significantly easier if the infinite possibilities presented in EAD
were constrained. A series of easily adaptable tools would mean that
fewer would have to resort to the "tag abuse" to fit the cookbook
stylesheets. They would have their own "GUI" tools to easily modify
the display. I am not convince that a stricter use of the DTD would
would significantly reduce an individual repositiory's ability to use
EAD to represent the vast majority of its requirements.
I personally am excited about the ability to use things like XSL(T)
combined with other tools to:
- automatically generate MARC records in MARC communications
format for automated insertion into online catalogs
- create PDF versions of documents for reading rooms
- gain a greater understanding of our users information needs by
providing alternate views of the information as a part of user
studies
- provide rich targetted cross colleciton searching for our end
users
- enhance the tag set to include collection management
information to enable implementation of a real single
source/multiple use document management system for archival
respositories..
XML can be an extremely powerful tool. If all we ever expect to do
with it is mount finding aids in HTML on the web, we are truly missing
some marvelous opportunities.
Finally, I would like to add that learning XSL may at first seem
complex but if you are interested in capitalizing on potential of XML
then it is worth learning. In fact, I would argue that it can help all
archivists to truly understand the distinctions between content and format
about which I have been ranting. That can only help us to develop a
common understanding of the potentials and limitations of EAD in this
arena.
Liz Shaw
Lecturer
School of Information Sciences
University of Pittsburgh
|