Print

Print


I'd like to second much of what Liz and Clay have written, but also to go
off on a slight tangent about EAD's "non-specificity and lack of rigor
that confounds effective processing for resource discovery". I once
mentioned to Liz speaking of TEI, but it applies to EAD as well, that our
community sorely needs to come to the understanding that a large,
inclusive, and flexible DTD *enables* interchange, it does not *guarantee*
it. If interchange, or "effective processing for resource discovery" are
serious goals, then serious effort must be undertaken to apply EAD
properly. Mere EAD validity does not buy you much from a machine
processing standpoint, indeed. Yet, it does narrow the range of
possibilities and establishes a space for further conversation and action.
No mean feat.

That said, I confess my uncertainty as to how inadequate EAD really is as
a descriptive metadata standard. unitid, unittitle, unitdate, physdesc,
extent, physfacet, controlaccess, subject, persname, corpname,
scopecontent, bioghist...their accompanying attributes, the allowance for
expression of hierarchical relations... Pretty rich set of elements.
Certainly it lacks the specificity needed in certain domains and
disciplines, but to the extent what you are describing shares the
characteristics of a competently arranged and described archives or
manuscript collection you'll probably be pretty well served. There's
likely *some* way of getting what you want. Take Clay's example of his
wrapping MARC in EAD to express subfacted subject headings:

<ead:subject><ead:mdWrap><ead:xmlData><marc:datafield tag="650" ind1=" "
ind2="0">
<marc:subfield code="a">Indians</marc:subfield>
<marc:subfield code="x">Antiquities.</marc:subfield>
</marc:datafield></ead:xmlData></ead:mdWrap></ead:subject>

It is not evident to me why this is more proper or useful than the
entirely EAD encoded:


<controlaccess encodinganalog="650">
        <subject encodinganalog="650a">Indians</subject>
        <subject encodinganalog="650x">Antiquities.</subject>
</controlaccess>

The difference, of course, is mostly semantic. The first views the terms
as a subject consisting of a MARC datafield with two subfield components.
The second views them as a controlaccess heading made up of two subject
terms.  What impedes interoperabilty here is not so much that EAD lacks
the ability to express a subfacted subject heading, but that, in some
circumstances another person or machine process might not expect or
understand a given valid EAD usage and therefore fail to process it as
intended or wanted.

Enhancement of EAD's capabilities via external schemata using W3C Schema or
inclusion of fragments using RelaxNG would be welcome.  However, what also
might be needed is collaborative development of ancillary techniques of
specifying EAD usage to enable well defined applications. These can range
from the highly formal (e.g., Schematron), to moderately formal (something
like METS Application Profiles), to informal (e.g. RLG's Application
Guidelines).

Enough for now,

Terry


On Fri, 7 Nov 2003, Elizabeth Shaw wrote:

> HI,
>
> Let me reinforce what Clay has said about changing EAD to accommodate
> another community of practice.
>
>  From a programming/data processing perspective EAD already suffers from
> a level of non-specificity and lack of rigor that confounds effective
> processing for resource discovery.
>
> While the DTD meets a political goal of flexibly accommodating a wide
> variety of extant practice in the community (although now we see
> arguments for even wider practice) it presents less than perfect data to
> the those who are either trying to build systems to effectively provide
> access to finding aids or who desire to migrate the data to other
> schemas for aggregated collections. (Look for example at Chris Prom's
> effort to shoehorn EAD into OAI)
>
> Let me suggest something for a moment. There are many communities of
> practice or discipline in this world. In this instance we have been
> talking about musicologists. They have very specific metadata needs
> because the objects that they describe may well have characteristics
> that are quite different that most objects. I would not describe a piece
> of music in the same way that I describe a sculpture.
>
> At the same time they may want to share their resources in aggregate
> resource locators that share only a few common significant characteristics.
>
> One approach to solving this dilemma is to provide a simple set of core
> elements (the Dublin Core approach). Another has been to generate a DTD
> that accommodates a lot of different practice. It allows alot and
> requires little (EAD approach). And a third approach has been to
> generate a specific DTD for a particular set of needs/community of practice.
>
> The first approach seems unsatisfactory to a specialized community of
> practice. Describing musical object with unqualified Dublin Core would
> not assist in the research of that material for the community.
>
> The second approach leads to inconsistent markup and inconsistent
> practice within a community. Although within a particular institution or
> consortium that decides on a subset of possibilities it can be
> effective. Across institutions it becomes difficult to process the data.
> As Chris Prom and I have both noted, it leads to nightmarish
> transformation problems. Inconsistently encoded data also leads to
> inconsistent resource recovery.
>
> For example one institution does extensive history and the name "Joe
> Smith" appears multiple times in the archives of John Doe because the
> history  describes the heady days of John and Joe's youth. But another
> institution captures minimal data like "correspondence ". At the second
> institution much of the correspondence in that collection are the
> letters that were received from Joe Smith by Sally Jones. There may
> actually be more contained in the second collection about and of good
> old Joe - but a retrieval engine will never find it for you. This is a
> case where the variant standards of description across institutions
> confounds resource discovery.
>
>  From the perspective of the machine processor, another aspect of the
> flexibility is even more problematic. As I have said before the lack of
> consistent handles on which to rely (elements that are not even
> captured) and inconsistent implementation (do I use <abstract>, <note>
> or <p>) all confound resource discovery and display. If I can not rely
> on finding information in a particular location but must seek it in
> numerous places, it becomes very difficult to write the processing
> instructions to either transform that data to an aggregate database or
> even to consistently recover it within a group of finding aids.
>
> The third option - a DTD/Schema targetted to the needs of a particular
> community of practice provides the best retrieval for the community of
> practice. But at first glance, one assumes that it then is least likely
> to "play nice with others". But, in fact, I would argue that a rich, and
> more rigorous descriptive encoding is more easily migrated to a common
> form than a flexible loose encoding.
>
> I would rather spend an afternoon writing an XSLT transformation routine
> to take a constrained set of data succinctly and rigorously defined by a
> community of practice and map it to a more general schema such as an OAI
> DUblin Core implementation than trying to guess how a bunch of
> institutions has implemented, each in their own way, a flexible DTD.
> The potential for resource discovery is so much greater in the first
> instance.
>
> Given rules and definitions of encoding and an expert in the community,
> it is truly a simple matter to generate the data to a more generalized
> form. But starting with flexibility one may never be able to move to
> greater rigor.
>
> So, for what it is worth as one who processes data and builds systems, I
> would rather see musicologists define their community needs in a rich
> and rigorous way. If they find that they want to play with archivists
> mapping their highly specific encoding to EAD will be a day's work.
> Moving in the other way will be nigh impossible.
>
> Dr. David Birnbaum, a member of the TEI council and chair of the U of
> Pitt Slavic department and I have often talked about this in the context
> of TEI. TEI is a behometh - and has the same sorts of inclusive goals as
> EAD. David is a linguist and as such he often has very specific encoding
> needs that almost no one else in the world would care about (except
> other Slavic linguists). He has often talked about having an "authoring
> DTD" - a DTD in which he encodes his data for his own needs. When he
> wants to play nice with others he writes a  transformation to TEI.
> Inevitably he loses some of the richness of the encoding that enables
> him to study the text effectively. But it is a both/and situation rather
> than an either/or.
>
> XSLT transformations were but a gleam in the eye of the XML community
> when the beta version (findaid) of EAD was being developed.
> Transformations from one DTD to another were laborious and time
> consuming. Now we have a tool that obviates the need to be all things to
> all people.
>
> While I appreciate the desire to include everyone in the EAD community,
> it might be better to include them by encouraging them to address the
> particular needs of their community of practice first and then find ways
> to map to EAD than to encourage a less than perfect solution to their
> particular needs. Not only will their resource discovery and analysis be
> richer within their own community but they may well have the opportunity
> to play nice with other broadly adopted standards.
>
> Finally, I question the notion of whether any collection is truly
> hierarchical in nature or whether despite the overlay of "intellectual
> hierarchy" on collection description, the way we think about collections
> is really driven by the fact that collections were physical and
> therefore had to be ordered in a single way. And our tools for
> description were linear - and therefore forced us to think in linear
> hiearchies. My relationship to my archives is really quite non-linear
> -even though my files are. Vannevar Bush and his Memex machine is still
> but a gleam in the eye of people thinking about information
> organization. But that is another email altogether.
>
> Liz Shaw
>
>
>
>
>
>
>
>
>
> Clay Redding wrote:
> > Andrew,
> >
> > Regarding the paragraph below, this is possible with XML Schema using
> > namespacing.  However, this is not possible using DTD.  At this point
> > you could get a well-formed XML document using namespacing (e.g.,
> > <ead:bioghist><marc:datafield tag="245">), but it would not validate.
> > Those schemas have to allow for mixed content, which most currently do
> > not.
> >
> > So, for instance, if I wanted to put Dublin Core or MARC XML into the
> > <bioghist>, the EAD Schema would specifically have to allow it.  Or, the
> > EAD Schema would have to take the approach that METS does by using XLink
> > (which EAD does support in name only in the DTD) or the
> > <mdWrap>/<xmlData> metadata wrapper elements for nesting in embedded XML
> > data.
> >
> > I converted the v1.0 DTD to Schema and embedded a <mdWrap><xmlData>
> > entity into it to experiment with such wrapping capabilities.  After
> > playing around with MARC, MODS, DC, etc., in EAD, it made up for certain
> > lack of features that attributes such as encodinganalogs now face in the
> > DTD.  It became much easier to deal with the common <subject
> > encodinganalog="650$a$x"> kinds of problems by nesting in MARC or MODS
> > tags inside the EAD as such:
> >
> > <ead:subject><ead:mdWrap><ead:xmlData><marc:datafield tag="650" ind1=" "
> > ind2="0">
> > <marc:subfield code="a">Indians</marc:subfield>
> > <marc:subfield code="x">Antiquities.</marc:subfield>
> > </marc:datafield></ead:xmlData></ead:mdWrap></ead:subject>
> >
> > So, theortetically, you could use any music-centric XML inside EAD.
> > However, as if standardization of markup was hard enough to achieve
> > across repositories, if arbitrary extension schemas were added into EAD,
> > one could make a strong argument that interoperability problems would
> > compound.
> >
> > In general I disagree with making EAD more generalized (or specific) for
> > non-archival purposes.  Another similar markup standard exists out there
> > for marking up "collection"s of materials regardless of format: the
> > Research Support Libraries Programme Collection Description (also in
> > stages of becoming a Dublin Core standard).  You could use something
> > like DC Qualified to add subordinate items to the collection-level
> > description.  Plus that, if you think about it, METS delivers much the
> > same thing that EAD could with its ability to link/declare hierarchical
> > relationships.   EAD doesn't have a monopoly on hierarchical structuring
> > of description.  That's why the music community needs to create
> > something it's own rather that to settle for a 75% solution by using an
> > existing standard.  Reference Liz Shaw's "high heel and mountain
> > climbing" analogy.
> >
> > The music library and music information retrieval communities are in
> > prime position to make the most of their own standards to serve as a
> > model for other disciplines.  Think of your ISMIR bretheren -- they can
> > add wonderful touches to music-based digital libraries, but not through
> > latching onto EAD.   I think they could mirror the archival description
> > movement with their own content standards, structure standards, and
> > presentation standards.  Here at Princeton we're looking at ways of
> > adding things like BWV (or insert other composer catalogs here) numbers,
> > name authority discrepancies, transliteration, etc., into Virtual
> > International Authority Files and XML Web Services to create a possible
> > union catalog/bibliographic utility type of tool for the larger music
> > bibliography community.  Not to mention FRBRization to solve the
> > problems that MARC currently has with displaying music resources in
> > online catalogs to end users.  We look forward to seeing if anyone else
> > out there is interested.
> >
> > Clay
> >
> > Andrew Hankinson wrote:
> >
> >> Its strengths are
> >> that it is a defined structure which maintains hierarchical
> >> relationships, and allows physically separate items to "appear" as a
> >> single collection. (in theory, at least.)  If only there were some way
> >> of actually describing the stuff within the collection....On a more
> >> technical note, I'm not sure if you can mix standards within
> >> one another.  For instance, in a <c> tag within EAD, could you, for
> >> instance, place a <performance> tag taken from the TEI?  A thought that
> >> just occurred to me: Nested Schemas?  Could I, for instance, define a
> >> section as TEI, and then define the next section with, say, MusicXML?
> >> Or define a TEI section WITHIN a MusicXML section WITHIN an EAD
> >> document.  All with validation and/or corresponding DTD's.
> >>  Like I said, I'm new at this, so if my understanding of these things
> >> are a little off, please correct me.
> >>
>

Terry Catapano
Special Collections Analyst/Librarian
Columbia University Libraries Digital Program
212-854-9942
[log in to unmask]

The opinions expressed do not reflect those of my institution, nor perhaps
of myself at a some future time.