Print

Print


Both Word 2003 and Open Office <http://www.openoffice.org > will enable you to save your file as XML. So, if your Word document is well structured you might not be too far off your goal. Getting to XML is the easy part; getting to your particular flavour of XML, i.e. EAD, may be a little harder, again depending on how the *.doc file is structured. The good news, though, is that your 'just' an XSLT transform away.

Regarding file size, we have EAD instance documents as large as 4MB, and dynamic application of a stylesheet absolutely breaks our application (xsltproc is the XSLT engine): for any instance over 2MB, therefore, we redirect to static HTML. A 4MB instance results in ca 2.5MB of HTML*, but with non-significant whitespace removed this can come in at under 2MB (still really too large for those with dialup connexions).

* the HTML is rather verbose, however, peppered as it is with <div class="c01"> &c.

St.

Stephen Yearl
Systems Archivist
Yale University Library::Manuscripts and Archives


 

>>> [log in to unmask] 03/13/06 2:59 PM >>>
I have a very large guide in Word format which I need to convert to EAD XML.
I am especially concerned about the Series section as we have over 800 boxes
to tag.



I am unable to find any way to do this (other than cutting and pasting from
Word to and an XML editor) except for using Text conversion software shown
on the EAD site http://www.loc.gov/ead/ag/agauthor.html#sec2c

Is this kind of software the only answer or is there another clever way?  If
commercial software is the solution, what products have been used and
recommended?



Also is there a recommendation of the length/size of an EAD guide?  I have
some concern about download time for users.



Thanks in advance for help.



M.J. Figard

Digital Initiatives Librarian

McGovern Historical Collections and Research Center

Houston Academy of Medicine - Texas Medical Center Library

1133 John Freeman Blvd.

Houston, Texas  77030

713.799.7141 fax 713.790.7052

NOTE new email address: [log in to unmask]