I strongly support Stephen's reply with regard to redundancy. This
question is often raised in our EAD workshops- why do I need to include
things like <head> and <arrangement> when I can automatically generate them
during output? My first reply is that you can but you cannot be sure that
someone else will. What is an obvious solution when the file is living on
your server and you can supply all the bells in whistles, may create
problems in a shared environment such as consortial database where there are
other assumptions about the availability of certain data elements that might
drive displays.
Another argument for greater consistency in encoding, probably driven by
some generally accepted and widely implemented content and encoding
protocol. But until that halcyon day-
Michael
-----Original Message-----
From: Michael Rush [mailto:[log in to unmask]]
Sent: Thursday, September 05, 2002 3:41 PM
To: [log in to unmask]
Subject: Encoding questions -- long
Hello all,
After a brief project hiatus, I've returned to the work of implementing EAD
here at the Mass. Historical Society. Looking with fresh eyes at the
encoding template I created earlier, I have some questions:
1. I'm hunting for ways to reduce redundancy. Encoding a series list in
<organization> seems superfluous when they are repeated in the <dsc> and can
be extracted for display in a list. Are there any compelling reasons to
continue to encode the series list in <organization>?
2. Also in an effort to reduce redundancy, I'm considering not using
<frontmatter>. The only piece of data which I had included in <frontmatter>
which is not encoded elsewhere is publication copyright info for guides we
actually published years ago, mostly to accompany microfilm editions. If
eadheader/publicationstmt refers to the electronic publication of a guide, I
see no where else to encode it. Does anyone else face the problem of
encoding two separate sets of publication data, one for a printed version of
a guide and one for the electronic version? If so, have you managed to do
it without using <frontmatter>?
3. I plan on using the normal attribute for date and unitdate tags. When
encoding a date that has a missing date month or year, is the best practice
to replace those digits with zeros? For example, would you encode September
2002 as normal="20020900" or September 5, no year, as normal="00000905"?
4. Does it make sense to encode collection dates as listed in the
eadheader/titlestmt/titleproper in <date> tags? In a moment of enthusiasm
for lots of <date> tags I thought it would be a good idea, but now I'm less
enthused and it seems pointless, especially since the same data is more
explicitly identified in the archdesc/did/unitdate elements.
5. How do you recommend using the <eadid>? I know lots of institutions use
the SGML Open Catalog specification, but I'm unclear on how it works what
the benefits are, and if it works in XML at all. We don't anticipate
sharing our EAD files in any sort of consortium at the moment, so I was
planning on using a simple four-digit file numbering system.
<eadid systemid="MHS" type="file" encodinganalog="identifier">0001</eadid>
(in which case the file name would be 0001.xml), for example.
6. A final, non-encoding question. For institutions that have decided to
deliver raw xml for client-side transformation, what are the advantages of
this approach? I'm of the opinion that if a user is ultimately going to see
an HTML based display, why not just send the HTML to begin with? Am I
missing something? (We will soon begin delivering XML as a part of a
digital text project, and I think the consensus here is that we will be
converting our xml data on the fly on our server.)
Thanks in advance for taking the time to read this laundry list, and for any
advice that comes to mind.
Mike
____________________________________
Michael Rush - Manuscript Processor
Massachusetts Historical Society
[log in to unmask] - (617)646-0553
|