This is a delayed response to Michael Fox's post of March 10
(appended below). After a lot of thought and discussion about the
<physloc> and <container> issue with the other members of Harvard's
Digital Finding Aids Project (DFAP), I offer the following
feedback, and a new proposal.
The first point is about equating box and folder numbers with
information about the contents of a file such as date or
form/genre. In the examples of content that Michael describes, he
compares:
Correspondence, 1900-1910
with:
Box 1 1 Adams
We would argue that while, in some ways, these are similar types
of information, in a fundamental way they are not. Namely, the two
parts of "correspondence, 1900-1910" will always be linked.
"Correspondence" will never become "flyers," for example. On the
other hand, "Box 1" is merely a description of where certain
materials happen to be housed at a specific moment in time. If WE
separate "Box 1" from "Adams" by reboxing the collection, "Adams"
will be separated from "Box 1" and become associated with "Box 2"
instead.
Secondly, DFAP participants discussed both of the options Michael
proposes for markup. The second option, that of adding the box
number to every single file description in the finding aid, is
exactly the type of extra keystroking that we have decided not to
employ. Especially in light of the firestorm about how "EAD is
forcing us to do a lot of extra keying in creating our finding
aids," we would like to avoid that option.
The first option (including <container> within a file's <did>)
doesn't seem quite right, either. From the tagging Michael posted
WE would surmise that Box 1 contains only one file:
<c LEVEL="file"><did><container>Box 1</container>
<unitid>1.</unitid><unittitle>[Adams, 1934]</unittitle></did></c>
<c LEVEL="file"><did><unitid>2.</unitid><unittitle>Albany Literary
Gazette
[1934]</unittitle></did></c>
The tag library for the Beta version of EAD says that <did>
"bundles eight elements identifying fundamental descriptive
information needed to identify the ...<c> being described...."
Since the note about the contents of a box is not a *component of*
the unit being described (it is a container that houses it), it
seems to me that the use of <did> here is not valid (although
admittedly it will validate). Isn't it misleading to place
container information within a <c> that describes one file, when
the container contains many files?
Finally, we do not think the discussion on this topic so far has
addressed all of DFAP's concerns about the issue. Michael says
that "In having to chose between the two, EAD has privileged
intellectual structure over the physical ...." While we agree with
the basic decision, we wonder whether there might be a way to
describe the physical that doesn't conflict with the intellectual,
rather than having to "choose" between the two.
We have looked into the possibility of using the <odd> tag for the
purpose of inserting information about containers wherever they
happens to fall in the finding aid. Currently the DTD requires
that <odd> *must* follow the <did> and precede any nested <c>s.
Perhaps altering the DTD slightly to allow the use of <odd>
interspersed with the various levels of <c>'s would solve the
problem. It would allow, for example, for the following:
<DSC><HEAD>INVENTORY</HEAD>
<C01 level="collection"><DID></DID>
<ODD><P>Box 1</P></ODD>
<C02><DID><UNITID>1.</UNITID><UNITTITLE>Things</UNITTITLE></DID></C02>
<C03><DID><UNITID>2-4:</UNITID><UNITTITLE> Contents of wooden
box</UNITTITLE></DID>
<C04><DID><UNITID>2.</UNITID><UNITTITLE>Stuff</UNITTITLE></DID></C04>
<ODD><P>Box 2</P></ODD>
<C04><DID><UNITID>3.</UNITID><UNITTITLE>Tiny
stuff</UNITTITLE></DID></C04>
<C04><DID><UNITID>4.</UNITID><UNITTITLE>Love
note</UNITTITLE></DID></C04></C03>
<ODD><P>Box 3</P></ODD>
<C02><DID><UNITID>5.</UNITID><UNITTITLE>Stuff</UNITTITLE></DID></C02>
</C01></DSC>
In this example, we have a variety of <c>s, some nested, but all
are in the <c01> which represents the entire group. The current
DTD will validate only Box 1 (because it immediately follows a
<did>), but not the others. [If you find the above example
difficult to follow, I suggest you take a look at one of our
finding aids (available in either HTML or SGML), such as Helen
Buttenwieser, available in the Beta version of the DTD from
http://findingaids.harvard.edu (look under Schlesinger Library).]
We propose the following change to the DTD to allow for the above
tagging. Change from:
<!ELEMENT c ((head?, did, (%m.desc.elems;)*, (thead?,
c+)*) | (drow+, c*))>
To:
<!ELEMENT c ((head?, did, (%m.desc.elems;)*, (thead?, c+,
odd?)*) | (drow+, c*)) >
This change would allow one to include the information that one
needs to include about physical location (that is not
intellectually wedded to descriptive data) into the EAD document
relatively easily, without compromising the integrity of the DTD
(in light of the above-mentioned decision about intellectual vs.
physical).
Hopefully there will be some discussion on this list about our
suggestion; we do expect to submit it formally and we value the
input of any and all EAD users out there.
Susan von Salis
Schlesinger Library
Radcliffe College [log in to unmask]
=============================
Original post:
Leslie Morris asks a very important question. How to encode the
following example.
Example A:
> File List
>Box 1
> 1. [Adams, 1934]
> 2. Albany Literary Gazette [1934]
> 3. Alden
> 4. American Council
>Box 2
> 5. American and Foreign Anti-Slavery Reporter [1934]
> 6. Amesbury Villager [1934-36]
This issue is extremely important because it goes directly to
a fundamental structural concept in EAD. There is an inherent
tension in container listings between hierarchies of intellectual
order (collection, series, file, item) and hierarchies of physical
organization (boxes and folders). This topic was extensively
analyzed during the developement of EAD, has been the topic of
numerous communications on this list, is raised at every EAD
workshop, and, I hasten to convey to Leslie, was carefully
reconsidered by the EAD Working Group during its meeting last Fall
when changes to the DTD for version one were considered.
In having to chose between the two, EAD has priviledged
intellectual structure over the physical for many good reasons that
need not be rehashed here. But that is not to suggest that there
is no relationship between the two. Box and folder numbers are,
after all, characteristics of a particular file just as the title
and date are.
>Harvard's desire to be able to insert container numbers AT ANY
POINT WITHIN
THE FINDING AID suggests that this data is just some sort
free-floating, disembodied information that has no structural
relationship to the rest of the inventory description. This is
not correct. Container data relates precisely and significantly to
other descriptive data. In fact, such container information makes
no sense at all except in relation to other descriptive elements.
Consider this recasting of Leslie's sample.
Example B:
Container Id Contents
>Box 1 1 Adams
>Box 1 2 Albany Literary Gazette
>Box 1 3 Alden
>Box 1 4 American Council
Box 2 5 American and Foreign Anti-Slavery
Reporter
Box 2 6 Amesbury Villager
There are two differences between examples A and B. One has to
do with presentation on the page. The other is more interesting
and significant. In example A, the researcher is asked to infer
that Adam and what follows is in Box 1 until one comes to another
implicit statement that what follows after American Council is in
Box 2. The structural relationship between the box number and
the ID and title data that follows is exactly the same in both
examples. Except that in one it is implicit and in the other it is
spelled out. The only real difference is in presentation. This
is what EAD is about- content and structure, not presentation.
Inventories are full of examples of such implicit inheritence.
Example C:
Correspondence
1900-1910
1911-1915
1916-1920
Subject Files
1911-1912
1913-1917
1918-1920
This really means the same as
Example D:
Correspondence, 1900-1910
Correspondence, 1911-1915
Correspondence, 1916-1920
Subject Files, 1911-1912
Subject Files, 1913-1917
Subject Files, 1918-1920
There is a fundamental, structural relationship between the
<container> element and other descriptive data such as <unittitle>.
Page presentation tends to mask that association, but it is there.
In our discussions about encoding here at the Minnesota Historical
Society, most of our problems have been in analyzing and
understanding legacy finding aids, in sorting out the kinds of
implicit understandings that we have tried to convey to the user
through what are to us very obvious but what must be to others
often very subtle distinctions about the relationships of different
materials based on physical evidence on the finding aid page.
Finally, let me respond by offering two examples of encoding
of
Leslie's example. The first was written by Kris Kiesling.
Example E:
<dsc TYPE="in-depth">
<head>File List</head>
<c LEVEL="file"><did><container>Box 1</container>
<unitid>1.</unitid><unittitle>[Adams, 1934]</unittitle></did></c>
<c LEVEL="file"><did><unitid>2.</unitid><unittitle>Albany Literary
Gazette
[1934]</unittitle></did></c>
<c
LEVEL="file"><did><unitid>3.</unitid><unittitle>Alden</unittitle></did><
/c>
<c LEVEL="file"><did><unitid>4.</unitid><unittitle>American
Council</unittitle></did></c>
<c LEVEL="file"><did><container>Box
2</container><unitid>5.</unitid>
<unittitle>American and Foreign Anti-Slavery Reporter
[1934]</unittitle></did></c>
<c LEVEL="file"><did><unitid>6.</unitid>
<unittitle>Amesbury Villager [1934-36]</unittitle></did></c>
<c LEVEL="file"><did><unitid>7.</unitid><unittitle>etc.
etc.</unittitle></did></c>
Here's another option that some people who have attended our
workshops seem to like.
Example F:
<dsc TYPE="in-depth">
<head>File List</head>
<c LEVEL="file"><did><container>Box 1</container>
<unitid>1.</unitid><unittitle>[Adams, 1934]</unittitle></did></c>
<c LEVEL="file"><did><container>Box 1</container>
<unitid>2.</unitid><unittitle>Albany Literary Gazette
[1934]</unittitle></did></c>
<c LEVEL="file"><did><container>Box 1</container>
<unitid>3.</unitid><unittitle>Alden</unittitle></did></c>
<c LEVEL="file"><did><container>Box 1</container>
<unitid>4.</unitid><unittitle>American
Council</unittitle></did></c>
<c LEVEL="file"><did><container>Box
2</container><unitid>5.</unitid>
<unittitle>American and Foreign Anti-Slavery Reporter
[1934]</unittitle></did></c>
<c LEVEL="file"><did><container>Box
2</container><unitid>6.</unitid>
<unittitle>Amesbury Villager [1934-36]</unittitle></did></c>
<c LEVEL="file"><did><container>Box
2</container><unitid>7.</unitid><unittitle>etc.
etc.</unittitle></did></c>
The reason for the explicit markup of container numbers in Exaple F
has to do with an anticipation of issues that might arise with
retrieval and display of the inventory. If a search finds a match
in the item
"Amesbury Villager," the system can retireve the necessary
descriptive data from the <c> that wraps up that item's information
except for its location which it inherits implicitly in examples A
and E from a sibling. This is very different from examples C and D
where the dates inherit data from their explicitly encoded parents.
Now some of the new linking aspects of version 1.0 of EAD will
make it possible to make the connections in Examples A and E with a
bit of encoding and programming, but it seems to many to be clearer
to explicitly code the information even if one uses the stylesheet
to suppress the actual display of all but the first instance. Of
course, if one were to make containers free-floating and
unconnected to the item descriptions as Harvard's proposal would
do, would make it impossible to pull this information together at
all.
Michael Fox
|