Print

Print


HI,

Let me reinforce what Clay has said about changing EAD to accommodate
another community of practice.

 From a programming/data processing perspective EAD already suffers from
a level of non-specificity and lack of rigor that confounds effective
processing for resource discovery.

While the DTD meets a political goal of flexibly accommodating a wide
variety of extant practice in the community (although now we see
arguments for even wider practice) it presents less than perfect data to
the those who are either trying to build systems to effectively provide
access to finding aids or who desire to migrate the data to other
schemas for aggregated collections. (Look for example at Chris Prom's
effort to shoehorn EAD into OAI)

Let me suggest something for a moment. There are many communities of
practice or discipline in this world. In this instance we have been
talking about musicologists. They have very specific metadata needs
because the objects that they describe may well have characteristics
that are quite different that most objects. I would not describe a piece
of music in the same way that I describe a sculpture.

At the same time they may want to share their resources in aggregate
resource locators that share only a few common significant characteristics.

One approach to solving this dilemma is to provide a simple set of core
elements (the Dublin Core approach). Another has been to generate a DTD
that accommodates a lot of different practice. It allows alot and
requires little (EAD approach). And a third approach has been to
generate a specific DTD for a particular set of needs/community of practice.

The first approach seems unsatisfactory to a specialized community of
practice. Describing musical object with unqualified Dublin Core would
not assist in the research of that material for the community.

The second approach leads to inconsistent markup and inconsistent
practice within a community. Although within a particular institution or
consortium that decides on a subset of possibilities it can be
effective. Across institutions it becomes difficult to process the data.
As Chris Prom and I have both noted, it leads to nightmarish
transformation problems. Inconsistently encoded data also leads to
inconsistent resource recovery.

For example one institution does extensive history and the name "Joe
Smith" appears multiple times in the archives of John Doe because the
history  describes the heady days of John and Joe's youth. But another
institution captures minimal data like "correspondence ". At the second
institution much of the correspondence in that collection are the
letters that were received from Joe Smith by Sally Jones. There may
actually be more contained in the second collection about and of good
old Joe - but a retrieval engine will never find it for you. This is a
case where the variant standards of description across institutions
confounds resource discovery.

 From the perspective of the machine processor, another aspect of the
flexibility is even more problematic. As I have said before the lack of
consistent handles on which to rely (elements that are not even
captured) and inconsistent implementation (do I use <abstract>, <note>
or <p>) all confound resource discovery and display. If I can not rely
on finding information in a particular location but must seek it in
numerous places, it becomes very difficult to write the processing
instructions to either transform that data to an aggregate database or
even to consistently recover it within a group of finding aids.

The third option - a DTD/Schema targetted to the needs of a particular
community of practice provides the best retrieval for the community of
practice. But at first glance, one assumes that it then is least likely
to "play nice with others". But, in fact, I would argue that a rich, and
more rigorous descriptive encoding is more easily migrated to a common
form than a flexible loose encoding.

I would rather spend an afternoon writing an XSLT transformation routine
to take a constrained set of data succinctly and rigorously defined by a
community of practice and map it to a more general schema such as an OAI
DUblin Core implementation than trying to guess how a bunch of
institutions has implemented, each in their own way, a flexible DTD.
The potential for resource discovery is so much greater in the first
instance.

Given rules and definitions of encoding and an expert in the community,
it is truly a simple matter to generate the data to a more generalized
form. But starting with flexibility one may never be able to move to
greater rigor.

So, for what it is worth as one who processes data and builds systems, I
would rather see musicologists define their community needs in a rich
and rigorous way. If they find that they want to play with archivists
mapping their highly specific encoding to EAD will be a day's work.
Moving in the other way will be nigh impossible.

Dr. David Birnbaum, a member of the TEI council and chair of the U of
Pitt Slavic department and I have often talked about this in the context
of TEI. TEI is a behometh - and has the same sorts of inclusive goals as
EAD. David is a linguist and as such he often has very specific encoding
needs that almost no one else in the world would care about (except
other Slavic linguists). He has often talked about having an "authoring
DTD" - a DTD in which he encodes his data for his own needs. When he
wants to play nice with others he writes a  transformation to TEI.
Inevitably he loses some of the richness of the encoding that enables
him to study the text effectively. But it is a both/and situation rather
than an either/or.

XSLT transformations were but a gleam in the eye of the XML community
when the beta version (findaid) of EAD was being developed.
Transformations from one DTD to another were laborious and time
consuming. Now we have a tool that obviates the need to be all things to
all people.

While I appreciate the desire to include everyone in the EAD community,
it might be better to include them by encouraging them to address the
particular needs of their community of practice first and then find ways
to map to EAD than to encourage a less than perfect solution to their
particular needs. Not only will their resource discovery and analysis be
richer within their own community but they may well have the opportunity
to play nice with other broadly adopted standards.

Finally, I question the notion of whether any collection is truly
hierarchical in nature or whether despite the overlay of "intellectual
hierarchy" on collection description, the way we think about collections
is really driven by the fact that collections were physical and
therefore had to be ordered in a single way. And our tools for
description were linear - and therefore forced us to think in linear
hiearchies. My relationship to my archives is really quite non-linear
-even though my files are. Vannevar Bush and his Memex machine is still
but a gleam in the eye of people thinking about information
organization. But that is another email altogether.

Liz Shaw









Clay Redding wrote:
> Andrew,
>
> Regarding the paragraph below, this is possible with XML Schema using
> namespacing.  However, this is not possible using DTD.  At this point
> you could get a well-formed XML document using namespacing (e.g.,
> <ead:bioghist><marc:datafield tag="245">), but it would not validate.
> Those schemas have to allow for mixed content, which most currently do
> not.
>
> So, for instance, if I wanted to put Dublin Core or MARC XML into the
> <bioghist>, the EAD Schema would specifically have to allow it.  Or, the
> EAD Schema would have to take the approach that METS does by using XLink
> (which EAD does support in name only in the DTD) or the
> <mdWrap>/<xmlData> metadata wrapper elements for nesting in embedded XML
> data.
>
> I converted the v1.0 DTD to Schema and embedded a <mdWrap><xmlData>
> entity into it to experiment with such wrapping capabilities.  After
> playing around with MARC, MODS, DC, etc., in EAD, it made up for certain
> lack of features that attributes such as encodinganalogs now face in the
> DTD.  It became much easier to deal with the common <subject
> encodinganalog="650$a$x"> kinds of problems by nesting in MARC or MODS
> tags inside the EAD as such:
>
> <ead:subject><ead:mdWrap><ead:xmlData><marc:datafield tag="650" ind1=" "
> ind2="0">
> <marc:subfield code="a">Indians</marc:subfield>
> <marc:subfield code="x">Antiquities.</marc:subfield>
> </marc:datafield></ead:xmlData></ead:mdWrap></ead:subject>
>
> So, theortetically, you could use any music-centric XML inside EAD.
> However, as if standardization of markup was hard enough to achieve
> across repositories, if arbitrary extension schemas were added into EAD,
> one could make a strong argument that interoperability problems would
> compound.
>
> In general I disagree with making EAD more generalized (or specific) for
> non-archival purposes.  Another similar markup standard exists out there
> for marking up "collection"s of materials regardless of format: the
> Research Support Libraries Programme Collection Description (also in
> stages of becoming a Dublin Core standard).  You could use something
> like DC Qualified to add subordinate items to the collection-level
> description.  Plus that, if you think about it, METS delivers much the
> same thing that EAD could with its ability to link/declare hierarchical
> relationships.   EAD doesn't have a monopoly on hierarchical structuring
> of description.  That's why the music community needs to create
> something it's own rather that to settle for a 75% solution by using an
> existing standard.  Reference Liz Shaw's "high heel and mountain
> climbing" analogy.
>
> The music library and music information retrieval communities are in
> prime position to make the most of their own standards to serve as a
> model for other disciplines.  Think of your ISMIR bretheren -- they can
> add wonderful touches to music-based digital libraries, but not through
> latching onto EAD.   I think they could mirror the archival description
> movement with their own content standards, structure standards, and
> presentation standards.  Here at Princeton we're looking at ways of
> adding things like BWV (or insert other composer catalogs here) numbers,
> name authority discrepancies, transliteration, etc., into Virtual
> International Authority Files and XML Web Services to create a possible
> union catalog/bibliographic utility type of tool for the larger music
> bibliography community.  Not to mention FRBRization to solve the
> problems that MARC currently has with displaying music resources in
> online catalogs to end users.  We look forward to seeing if anyone else
> out there is interested.
>
> Clay
>
> Andrew Hankinson wrote:
>
>> Its strengths are
>> that it is a defined structure which maintains hierarchical
>> relationships, and allows physically separate items to "appear" as a
>> single collection. (in theory, at least.)  If only there were some way
>> of actually describing the stuff within the collection....On a more
>> technical note, I'm not sure if you can mix standards within
>> one another.  For instance, in a <c> tag within EAD, could you, for
>> instance, place a <performance> tag taken from the TEI?  A thought that
>> just occurred to me: Nested Schemas?  Could I, for instance, define a
>> section as TEI, and then define the next section with, say, MusicXML?
>> Or define a TEI section WITHIN a MusicXML section WITHIN an EAD
>> document.  All with validation and/or corresponding DTD's.
>>  Like I said, I'm new at this, so if my understanding of these things
>> are a little off, please correct me.
>>