See comments inline below.
Michael Fox
Head of Processing
Minnesota Historical Society
345 Kellogg Blvd West
St. Paul MN 55102-1906
phone: 612-296-1014
fax:  612-296-9961
[log in to unmask]

>Susan von Salis writes:
>>This is a delayed response to Michael Fox's post of March 10
>>(appended below).  After a lot of thought and discussion about the
>><physloc> and <container> issue with the other members of Harvard's
>>Digital Finding Aids Project (DFAP), I offer the following
>>feedback, and a new proposal.
>>The first point is about equating box and folder numbers with
>>information about the contents of a file such as date or
>>form/genre.  In the examples of content that Michael describes, he
>>Correspondence, 1900-1910
>>Box 1       1             Adams
>>We would argue that  while, in some ways, these are similar types
>>of information, in a fundamental way they are not.  Namely, the two
>>parts of "correspondence, 1900-1910" will always be linked.
>>"Correspondence" will never become "flyers," for example.  On the
>>other hand, "Box 1" is merely a description of where certain
>>materials happen to be housed at a specific moment in time.  If WE
>>separate "Box 1" from "Adams" by reboxing the collection, "Adams"
>>will be separated from "Box 1" and become associated with "Box 2"

    This assertion does not square with my experience with manuscripts.
 While we may rebox a collection and thereby change the location
associated with a given file, we are just as apt to refolder a file (as
the result of subsequent gifts) and end up with two files:
Correspondence, 1900-1905, and Correspondence, 1906-1910.   The point is
that all these pieces of information are associated together even if
they change over time.   (see next paragraph for more)
>>Secondly, DFAP participants discussed both of the options Michael
>>proposes for markup.  The second option, that of adding the box
>>number to every single file description in the finding aid, is
>>exactly the type of extra keystroking that we have decided not to
>>employ.  Especially in light of the firestorm about how "EAD is
>>forcing us to do a lot of extra keying in creating our finding
>>aids," we would like to avoid that option.

    This examples was not given as an "option" but merely to illustrate
a point.

    Interestingly, however, many participants in EAD workshops have
actually expressed a desire to change current practice and begin to
explicitly enter such data for each component for reasons that I will
attempt to articulate later.

    I don't know about any firestorms over EAD forcing a lot of extra
keying.  The effort required is largely a matter of the tools one uses
to authoring EAD documents.  Anyone typing tags by hand (or using
repetitive mousing to insert tags as in Author/Editor) needs to
seriously reconsider their tool set.  There are too many time-saving
options for us to be mislead by that red herring.   (I do remember the
same argument about MARC).   The answer is the same- if you get benefit
from the work, you will do it; it you don't see any advantage, do
something else.

>>The first option (including <container>  within a file's <did>)
>>doesn't seem quite right, either. From the tagging Michael posted
>>WE would surmise that Box 1 contains only one file:

     This is, in fact, one of the reasons why some achivists have
decided to explicitly repeat box number for each component.   They
realize that the implict inheritance of location, so common in the
structure of many existing inventories may confuse the reader.

     However, it is essentially the same as the following presentation
style, that is widely employed.

>Box          Id                    Contents
>  1            1             Adams
>                2             Albany Literary Gazette
>                3             Alden
>                4             American Council
>  2            5             American and Foreign Anti-Slavery
>                6             Amesbury Villager

Something is clearly implied about box locations.  Are you confused?

><c LEVEL="file"><did><container>Box 1</container>
>><unitid>1.</unitid><unittitle>[Adams, 1934]</unittitle></did></c>
>><c LEVEL="file"><did><unitid>2.</unitid><unittitle>Albany Literary

>>The tag library for the Beta version of EAD says that <did>
>>"bundles eight elements identifying fundamental descriptive
>>information needed to identify the ...<c> being described...."
>>Since the note about the contents of a box is not a *component of*
>>the unit being described (it is a container that houses it), it
>>seems to me that the use of <did> here is not valid (although
>>admittedly it will validate).  Isn't it misleading to place
>>container information within a <c> that describes one file, when
>>the container contains many files?

    Location is one of the fundamental descriptive elements of a
component.  Whether it is given explicitly for each component or implied
is a different matter.  It seems entirely consistent to treat location
as such even if it sometimes changes.

    The entire proposal given below has the exact opposite effect.  To
completely divorce location from content.  The solution to put location
into <odd> seems to me to be simply another way to make container
locations "free-floating" and divorced from the files they contain.
This association is neither casual nor insignificant.

>>Finally, we do not think the discussion on this topic so far has
>>addressed all of DFAP's concerns about the issue.   Michael says
>>that "In having to chose between the two, EAD has privileged
>>intellectual structure over the physical ...."  While we agree with
>>the basic decision, we wonder whether there might be a way to
>>describe the physical that doesn't conflict with the intellectual,
>>rather than having to "choose" between the two.
>>We have looked into the possibility of using the <odd> tag for the
>>purpose of inserting information about containers wherever they
>>happens to fall in the finding aid.  Currently the DTD requires
>>that <odd> *must* follow the <did> and precede any nested <c>s.
>>Perhaps altering the DTD slightly to allow the use of <odd>
>>interspersed with the various levels of <c>'s would solve the
>>problem.  It would allow, for example, for the following:
>><C01 level="collection"><DID></DID>
>><ODD><P>Box 1</P></ODD>
>><C03><DID><UNITID>2-4:</UNITID><UNITTITLE> Contents of wooden
>><ODD><P>Box 2</P></ODD>
>><ODD><P>Box 3</P></ODD>

     This example would create the same problem that Susan objects to in
the current model if Box 1 were to contain parts of two different
<c01>s.   This is not an uncommon occurance.   Either the location of
the second <c01> would have to be inferred from its explicit declaration
in the first <c01> or it would have to be repeated in the second <c01>.
Consider what might actually happen, using Susan's example.  Since box 3
has only one file in it, I might put files from another series in the
same box.


><C01 level="collection"><DID></DID>
><ODD><P>Box 1</P></ODD>
><C03><DID><UNITID>2-4:</UNITID><UNITTITLE> Contents of wooden
><ODD><P>Box 2</P></ODD>
><ODD><P>Box 3</P></ODD>

<c01><did><unittitle>Correspondence about important things</unittitle>
><ODD><P>Box 3</P></ODD>
<c02><did><unititle>more stuff</unititle></did></c02>

    Moreover, we still have all the problems with confused inheritence.
The following line from Susan's example has the same problem that so
greatly concerns her when locations are not explictly given in the
present EAD structure, namely what is its location?

><C03><DID><UNITID>2-4:</UNITID><UNITTITLE> Contents of wooden

      An online search that returns only this information fails to
answer a fundamental question for the patron- where is this stuff
located?   The solution that Susan proposes would encode the data in
such a way that the computer would have a very difficult time answering
that question.

     The only solution to unequivocably resolve this problem to
explictly state locations and link them unambiguously to the files to
which they relate.   We are often confused by the fact that the
presentation of information on the printed page, such as the association
between an object and its location, makes these relationships more or
less clear to the reader who can parse them out based on visual clues.
The computer cannot do that.   Harvard's solution addresses page
presentation only.   If we want to make some further reuse of this data,
we cannot be fixed on the way it will presented in one medium only.
For example, the fact that book call numbers were printed on more than
one line on a catalog card should not mislead us into thinking that
somehow this data ought to be broken up into two parts when it is one
(or actually several) contiguous piece(s) of information that we have
chosen, for purposes of convenience to print on two lines.    Explicit
linkages of related information do require additional data entry whether
we chose to explictly repeat the container numbers or use some ID
linking mechanism to connect them.   Whether or not this is worth the
slight extra overhead or not remains to be seen.   EAD certainly does
not require one to do so.

>>In this example, we have a variety of <c>s, some nested, but all
>>are in the <c01> which represents the entire group.  The current
>>DTD will validate only Box 1 (because it immediately follows a
>><did>), but not the others.   [If you find the above example
>>difficult to follow, I suggest you take a look at one of our
>>finding aids (available in either HTML or SGML), such as Helen
>>Buttenwieser, available in the Beta version of the DTD from
>> (look under Schlesinger Library).]
>>We propose the following change to the DTD to allow for the above
>>tagging.  Change from:
>><!ELEMENT c          ((head?, did, (%m.desc.elems;)*, (thead?,
>>c+)*) | (drow+, c*))>
>><!ELEMENT c          ((head?, did, (%m.desc.elems;)*, (thead?, c+,
>>odd?)*) |  (drow+, c*))  >
>>This change would allow one to include the information that one
>>needs to include about physical location (that is not
>>intellectually wedded to descriptive data) into the EAD document
>>relatively easily, without compromising the integrity of the DTD
>>(in light of the above-mentioned decision about intellectual vs.
>>Hopefully there will be some discussion on this list about our
>>suggestion; we do expect to submit it formally and we value the
>>input of any and all EAD users out there.
>>Susan von Salis
>>Schlesinger Library
>>Radcliffe College [log in to unmask]
>>Original post:
>>Leslie Morris asks a very important question.  How to encode the
>>following example.
>>Example A:
>>>                File List
>>>Box 1
>>>        1.  [Adams, 1934]
>>>        2.  Albany Literary Gazette [1934]
>>>        3.  Alden
>>>        4.  American Council
>>>Box 2
>>>        5.  American and Foreign Anti-Slavery Reporter [1934]
>>>        6.  Amesbury Villager [1934-36]
>>     This issue is extremely important because it goes directly to
>>a fundamental structural concept in EAD.   There is an inherent
>>tension in container listings between hierarchies of intellectual
>>order (collection, series, file, item) and hierarchies of physical
>>organization (boxes and folders).   This topic was extensively
>>analyzed during the developement of EAD, has been the topic of
>>numerous communications on this list, is raised at every EAD
>>workshop, and, I hasten to convey to Leslie, was carefully
>>reconsidered by the EAD Working Group during its meeting last Fall
>>when changes to the DTD for version one were considered.
>>     In having to chose between the two, EAD has priviledged
>>intellectual structure over the physical for many good reasons that
>>need not be rehashed here.  But that is not to suggest that there
>>is no relationship between the two.   Box and folder numbers are,
>>after all, characteristics of a particular file just as the title
>>and date are.
>>>Harvard's desire to be able to insert container numbers AT ANY
>>THE FINDING AID suggests that this data is just some sort
>>free-floating, disembodied information that has no structural
>>relationship to the rest of the inventory description.   This is
>>not correct.  Container data relates precisely and significantly to
>>other descriptive data.  In fact, such container information makes
>>no sense at all except in relation to other descriptive elements.
>>Consider this recasting of Leslie's sample.
>>Example B:
>>Container        Id                    Contents
>>>Box 1       1             Adams
>>>Box 1       2             Albany Literary Gazette
>>>Box 1       3             Alden
>>>Box 1       4             American Council
>>      Box 2       5             American and Foreign Anti-Slavery
>>      Box 2       6             Amesbury Villager
>>    There are two differences between examples A and B.  One has to
>>do with presentation on the page.  The other is more interesting
>>and significant.  In example A, the researcher is asked to infer
>>that Adam and what follows is in Box 1 until one comes to another
>>implicit statement that what follows after American Council is in
>>Box 2.     The structural relationship between the box number and
>>the ID and title data that follows is exactly the same in both
>>examples.  Except that in one it is implicit and in the other it is
>>spelled out.   The only real difference is in presentation.  This
>>is what EAD is about- content and structure, not presentation.
>>    Inventories are full of examples of such implicit inheritence.
>>Example C:
>>     Correspondence
>>           1900-1910
>>           1911-1915
>>           1916-1920
>>     Subject Files
>>           1911-1912
>>           1913-1917
>>           1918-1920
>>This really means the same as
>>Example D:
>>Correspondence, 1900-1910
>>Correspondence, 1911-1915
>>Correspondence, 1916-1920
>>Subject Files, 1911-1912
>>Subject Files, 1913-1917
>>Subject Files, 1918-1920
>>      There is a fundamental, structural relationship between the
>><container> element and other descriptive data such as <unittitle>.
>>Page presentation tends to mask that association, but it is there.
>> In our discussions about encoding here at the Minnesota Historical
>>Society, most of our problems have been in analyzing and
>>understanding legacy finding aids, in sorting out the kinds of
>>implicit understandings that we have tried to convey to the user
>>through what are to us very obvious but what must be to others
>>often very subtle distinctions about the relationships of different
>>materials based on physical evidence on the finding aid page.
>>      Finally, let me respond by offering two examples of encoding
>>Leslie's example.   The first was written by Kris Kiesling.
>>Example E:
>><dsc TYPE="in-depth">
>><head>File List</head>
>><c LEVEL="file"><did><container>Box 1</container>
>><unitid>1.</unitid><unittitle>[Adams, 1934]</unittitle></did></c>
>><c LEVEL="file"><did><unitid>2.</unitid><unittitle>Albany Literary
>><c LEVEL="file"><did><unitid>4.</unitid><unittitle>American
>><c LEVEL="file"><did><container>Box
>><unittitle>American and Foreign Anti-Slavery Reporter
>><c LEVEL="file"><did><unitid>6.</unitid>
>><unittitle>Amesbury Villager [1934-36]</unittitle></did></c>
>><c LEVEL="file"><did><unitid>7.</unitid><unittitle>etc.
>>Here's another option that some people who have attended our
>>workshops seem to like.
>> Example F:
>><dsc TYPE="in-depth">
>><head>File List</head>
>><c LEVEL="file"><did><container>Box 1</container>
>><unitid>1.</unitid><unittitle>[Adams, 1934]</unittitle></did></c>
>><c LEVEL="file"><did><container>Box 1</container>
>><unitid>2.</unitid><unittitle>Albany Literary Gazette
>><c LEVEL="file"><did><container>Box 1</container>
>><c LEVEL="file"><did><container>Box 1</container>
>><c LEVEL="file"><did><container>Box
>><unittitle>American and Foreign Anti-Slavery Reporter
>><c LEVEL="file"><did><container>Box
>><unittitle>Amesbury Villager [1934-36]</unittitle></did></c>
>><c LEVEL="file"><did><container>Box
>>The reason for the explicit markup of container numbers in Exaple F
>>has to do with an anticipation of issues that might arise with
>>retrieval and display of the inventory.   If a search finds a match
>>in the item
>>"Amesbury Villager," the system can retireve the necessary
>>descriptive data from the <c> that wraps up that item's information
>>except for its location which it inherits implicitly in examples A
>>and E from a sibling.  This is very different from examples C and D
>>where the dates inherit data from their explicitly encoded parents.
>>     Now some of the new linking aspects of version 1.0 of EAD will
>>make it possible to make the connections in Examples A and E with a
>>bit of encoding and programming, but it seems to many to be clearer
>>to explicitly code the information even if one uses the stylesheet
>>to suppress the actual display of all but the first instance.   Of
>>course, if one were to make containers free-floating and
>>unconnected to the item descriptions as Harvard's proposal would
>>do, would make it impossible to pull this information together at
>>Michael Fox