---------- Forwarded message ----------
Date: Wed, 24 Apr 1996 11:08:28 CDT
From: Peter Flynn <[log in to unmask]>
To: Multiple recipients of list TEI-L <[log in to unmask]>
Subject: Re: EAD

Stephen Davis <[log in to unmask]> writes:

   I'm afraid I'm increasingly concerned, not only in the EAD, but in the
   TEI and some other newly proposed DTDs, how far away from a "data
   dictionary" approach we're headed, and instead apparently about to
   implement for all new digital library applications an entirely
   postionally-based syntactic approach to data structuring.

Not being familiar with the EAD work, I'm not clear what our
"distance" from a DD approach is, but I would share your concerns
(especially as we're putting in a new library system here :-)

   Realistically, how often do we want to wrestle with why a particular
   element isn't defined within another element?  This seems to let the
   container drive the content, where it should really be the opposite.
   WHEREVER I need a <persname> I should be able to use it.  At this rate
   it looks as though it would be best to define every element as possibly
   appearing within any other element, in any order!  And, actually, why

I don't think we need go that far, but I do agree (quite strongly)
that there is a need for far wider availability of the descriptive
elements within the TEI structure - persname is just one example of an
incredibly useful concept that I am finding people need to use pretty
much anywhere that you can type #PCDATA (and I am happy to report that
I have finally got persname and placename to work, details later :-)

   (A version of this issue came up several times at the recent TEI
   workshop in DC.  When people finally get down and dirty with encoding a
   corpus, they appear to need the flexibility of using an element
   _wherever_ it's needed, rather than just where someone thought to define
   it. They also don't appear to want to manage dozens of genre-based DTDs
   for poetry, prose, drama, verse drama, etc. )

Managing the genre-based apps is not really the problem. My guess is
that each project will settle on a broadly-based DTD subset which can
describe its texts, and perhaps use one or two more specialist ones
for a few documents. I think there is an element of self-selection (eg
texts are all by one individual, or all from one period, or all from
one culture, etc) which helps in this.

But it is true that an element should be available where it is
_needed_ rather than where it seemed likely at the time the DTD was
compiled. Prosimetrum is the best (worst?) example: it ought to be
possible in the middle of a paragraph to start a poetic fragment of
anything from a word to several verses, and then resume the same
paragraph. The structure assumed by most DTDs I have seen forces
termination of the para, insertion of a poem, then the start of a new
para. In the TEI, the only way I have seen so far to make this work is
to use TEXT inside P, which means the versicle is separated from the
para text by three additional levels of tagging (TEXT, BODY, LG). If
I've missed a cleaner way of doing it, someone please shout!

Fortunately, one of the benefits of SGML and the TEI is that they do
allow you to define where you want stuff in this manner.
Unfortunately, low-level modification (like changing content model
groups) does require a serious grasp of the standard as well as
in-depth knowledge of what is actually _inside_ the existing DTD.

   Perhaps we will need to rethink a good part of the structure of SGML
   documents, e.g., to use broad hierarchies reflecting significant
   structural components of the text, and then simply defining an extended
   data dictionary that can be applied wherever needed under any of the
   hierarchical levels.  (Frankly, given that there are no "rules" for the
   content of a finding aid, how can all elements NOT be valid under all
   other elements, in any order desired??)

I'm not clear about this. ISO 8879 gives very explicit rules for
parsing, which is presumably the principal component of a finding aid.
It _is_ a problem, though, when using SGML descriptively rather than

   This kind of generalized strategy might have the added benefit of
   allowing the creation of a few, more generic blanket DTDs, reducing the
   DTD-proliferation we're starting to see.

I haven't actually seen this proliferation, and I'd be interested to
look at some of these DTDs. Do you have any pointers?

[list omitted]
                       <corpname> <famname> <genreform> <geogname>
                       <name> <occupation> <persname>

These are probably fine for some purposes, but would need much
greater refinement and subtagging for any kind of analytical work.
Despite a few drawbacks and unevennesses, this is where I think the
TEI scores.