Print

Print


Until now, the EAD instances of the Dutch Nationaal Archief finding aids
has been validated against the general DTD for EAD 2002, ead.dtd. We
have tried to make clear what migration to Schema would involve in our
particular situation. As all institutions have to decide when to migrate
to the EAD Schema, our considerations might be of interest to others.

ARGUMENTS FOR MIGRATION (NOW):
-------------------------------------------------------
- 1 we're ready for it, we can do it and we can afford it;
- 2 rumor has it that the next version of EAD will have Schema
validation only, we might as well migrate now and be done with it;
- 3 Schema checks not only the existence of elements and attributes but
also - some - values;
- 4 Schema verifies values in real time;

ARGUMENTS AGAINST MIGRATION:
----------------------------------------------------
- 5 let others discover the pitfalls and duly report. Like Linus Van
Pelt, we'll settle for being the 47th man on the moon;
- 6 we're not sure which direction the new version of EAD is taking and
whether it'll accommodate our future needs;
- 7 we have yet to find an appropriate way to handle our graphical
appendices (charts, diagrams, pictures, schemas);
- 8 namespace issues;
- 9 costs of updating:
--- A all EAD files;
--- B pdf and html stylesheets;
--- C other maintenance/conversion stylesheets and scripts;
--- D editors and other tools (plug ins may have to be written);
- 10 costs of keeping our staff happy;
- 11 another home made tool does all the checks that Schema does, and
many more;
- 12 Schema verifies values in real time;
- 13 paying our Xhive administrator (XHive = XML database) to update his
setup.

CONSIDERING THE PROS AND CONS
------------------------------------------------------
ad 3, 11: Schema can validate not only the existence of elements and
attributes but also their values. This feature is most evident on the
normal attribute of <date>'s and <unitdate>'s, and experience shows us
that many errors are being made. This added accuracy is fundamental for
the use of dates in search engines and data exchange. The Nationaal
Archief, however, to conform to its traditional finding aid design,
already uses a stylesheet to check elements, attributes and values that
the DTD doesn't. This offers the opportunity to check EAD on homegrown
extensions, like level="otherlevel" otherlevel="filegrp" and
restrictions, such as always an abstract in the high level did
(/ead/archdesc/did), always a unittitle in components (c01-c12 and NO
cipher-less c component), etc. etc. This way, our version of EAD still
conforms to the general DTD (ead.dtd).

ad 4, 10, 12: We are not sure yet whether real time validation is
beneficial for internal and external staff. It might hamper the flow of
data input.

ad 6, 7, 8: As of yet, we use very few of the 17 XLink elements, and
those sparingly. In the near future, we will enrich our EAD instances by
adding 'ref'-like links. More importantly, we still have to devise a way
to insert our graphical appendices, be they jpegs, svg or other.
Recently, remarks have been made on the EAD list on the Schema cost of
'loss of entity declarations for non-parsed external entities'. Tying
EAD up with another schema like METS or MODS has consequences which we
do not understand yet, but which probably involve namespace issues, as
also have been mentioned on the EAD list. Stephen Yearl's 19 September
2007 posting Costs of DTD to Schema migration outlines some far-reaching
consequences for EAD in general. 

ad 9 A: dtd2schema.xsl, provided by the EAD Schema Working Group, does
most of the work almost out of the box, but removes some whitespace at
the end of attributes where we need the extra space (yes, we should fix
that in our stylesheets). Scripting has left us with normal attributes
with wrong or to-be-adjusted values (YYYYMM instead of YYYY-MM). A
little script will have to take care of that. 

ad 9 B: Setting the right attributes on the xsl stylesheet root element,
and adjusting Xpath expressions for XLink elements and attributes are
minor tasks. In view of namespace issues, migration to XSLT 2.0 seems
almost forced, in order to be able to use
xpath-default-namespace="urn:isbn:1-931666-22-9". This migration, in its
turn, will entail some other changes and bug fixing to prevent Run-time
errors. Some extensive testing will have to be done. 

ad 9 C: We found that we have currently lying around more little scripts
for occasional use than we might have thought. Those have to be kept in
stock for occasional use and will have to be adjusted (probably at an
inconvenient time).

ad 9 D: Not all editors support Schema (or Relax NG). 

ad 10: Staff may have to change their favorite tools (e.g. for
interpreting validation messages).

CONCLUSION
--------------------
In general: it all seems very doable, but there's more to it than you
might expect after running dtd2schema.xsl on an EAD file for the first
time and finding that the result validates. Redevelopment and
(re)testing of stylesheets and scripts are added costs. 

If the intention to migrate is there, the question arises when to do so.
We have some intention to migrate, but not at all costs. All the extra
checks on values that the current Schema adds to the DTD, are already
incorporated in our checking stylesheet. Crucial for us at the moment is
the matter of graphical appendices, which may well be involved with the
way EAD develops. So the Dutch Nationaal Archief has decided to postpone
the migration from DTD to Schema and to wait for the implementation of
the next version of EAD.


2007-12-17

Dirk van Laanen
Henny van Schie

Nationaal Archief -- www.nationaalarchief.nl
P.O.Box 90520, 2509 LM Den Haag, NL
phone: ++31-70-3315548 - fax: ++31-70-3315499