Print

Print


David,

While I understand your motivation (to reduce something you have not
mastered to something you have), in the long run you will be better off
mastering EAD and XML/SGML. XML/SGML documents are particularly useful as
the authoritative *source* for producing the variety of products you want.
It is one of the reasons it was chosen for EAD.

There is a variety of free software available for producing RTF, HTML, and
other products from XML/SGML. Thus your investment would be in developing
the expertise to use the software and master the standards, and not in
software. At least initially. Of course mastering the standards is not a
trivial undertaking, but doing so is a good long term investment.

The scenario you propose is not economic. It is not clear what your
"authoritative source" would be. If you strip the tags from your finding
aids, spend time editing the content in Word to get the formatting you
want, and finally create HTML and PDF from the Word file, what happens if
you need to fix a typo? Or worse, add a biography or scope and content
section to your finding aid? You either have to edit the SGML/XML file and
start over with Word, or fix the error in both Word and the XML/SGML file.
Neither is a good approach, both because of the time involved, but also
because fixing the same error twice will lead to introducing new errors and
inconsistencies (the phone rings after you have finished editing your Word
file, and you forget to edit the SGML/XML file; or you do not fix the error
in exactly the same way in both files.) At any rate, you should not edit
the SGML/XML file without at least learning how to  parse.

If you learn to work with the XML/SGML as your maintenance file, then you
can maintain as few as two stylesheets to produce your outputs
automatically: one for RTF, one for HTML.

For example, using free software, I can convert one file easily into RTF
and HTML, and using the same stylesheet used for converting to HTML, I can
publish the XML file directly to Internet Explorer 5.0.

Here's at list of software needed to do all of this:

James Clark's SP suite, especially NSGMLS (for parsing) and SX for
converting SGML to XML relatively painlessly.

xml4j from IBM for parsing XML (though the output of SX can be used without
reparsing).

NoteTab as a good ASCII editor that can be configured to launch NSGMLS to
parse a document you are working on and to create "macros" for performing
repetitive tasks. (Thank you Stephen Yearl for this tip; the best tip of
the year!). This you would use for editing your SGML file, and authoring
DSSSL and XSL files (see below).

James Clark's Jade, for validating and using a DSSSL stylesheet to convert
EAD/SGML to RTF.

Either James Clark's XT or LotusXSL from IBM for validating XSL files and
using them to convert EAD/XML to HTML. The same style sheet, if one pays
attention to the limited (though not severely limited, at least at this
early stage) nature of MS IE5 implementation of XSL, can be used for
publishing directly to it.

There is also a wide variety of other free software. I am just mentioning
ones I have tried and found to be good to excellent.

One has to be careful with the XSL, as the specification is still being
revised, but I have found that with each new draft of the specification and
new versions of software conforming (or trying to conform) to it, I can
make the necessary changes in a few minutes. (Typically the changes are
simply "search and replace.").

For a few dollars, one can add an SGML editor to the suite: WordPerfect
with SGML, or Interleaf's Author/Editor. Mostly good for learning a DTD,
and for original editing and maintenance. Not good for conversion.

In this scenario, you maintain your finding aids in SGML. Transform the
SGML into RTF and XML, and use XML to produce HTML and publish directly to
IE5.

Please do give some consideration to this strategy.

While I am on this topic, would there be interest in a 5-day course for EAD
(and TEI) technical support staff that introduced students to the various
pieces of software listed above (and perhaps others)? Please answer me
directly at [log in to unmask] If there is enough interest, I'll try to
see if it could be offered as part of the Rare Books School curriculum at
Virginia, or perhaps as a workshop offered by the new Text Encoding
Initiative Consortium (www.tei-c.org)

Daniel



 At 10:38 AM 4/21/99 -0400, you wrote:
>Colleagues--
>
>I need your advice on a problem I have encountered with our project to
>convert and encode our finding aids.
>
>None of our 3,000+ finding aids are available in electronic form. I have
>received a grant which I hope can kill several birds with one stone. My
>goals for the project are: 1) convert the finding aids into electronic
>form, 2) acquire an electronic text version (ASCII or something else
>that I can manipulate in a word processing software (MS WORD) and an
>HTML writer/editor (Netscape Composer)), and 3) acquire an EAD-encoded
>version.
>
>We have hired Apex (as we are an RLG member) to convert and encode the
>finding aids. We plan to send the EAD versions to Archival Resources.
>Because I want more flexibility for future uses of the finidng aids
>(whatever they later may be given advances in technology), I would like
>to maintain locally a text version (which for now could be manipulated
>using MS WORD). Because Archival Resources is available only
>fee-for-service, and I don't have the technical support necessary to
>maintain SGML documents, I am also planning on maintaining at our WEB
>site an HTML encoded version meeting our specifications for structure,
>etc.
>
>Here is the problem. Apex has provided me with a first batch of one
>hundred EAD encoded finding aids. I had hoped to be able to use the
>encoded versions in other ways by stripping them of the coding BUT alas,
>with most grand ideas, I have been unsuccessful! Of course, for more
>money (which I'd like to spend on other issues), I am sure Apex would be
>happy to resolve this matter for me. Before pursuing this option with
>the remaining 2,900 finding aids, however, I wanted to know if  there
>was a  "de-babble-izer" that I could purchase to magically remove the
>encoding.
>
>I am happy to pay the vendor for the deliverables I need but I wanted to
>check with you all first! I look forward to hearing from you.
>
>David
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>David de Lorenzo                       201 West Monument Street
>Library Director                       Baltimore, MD 21201-4674
>Maryland Historical Society            (410) 685-3750 Ext. 309
>Library of Maryland History            FAX: (410) 385-2105
>                        http://www.mdhs.org
>
>
Daniel V. Pitti         Project Director
Institute for Advanced Technology in the Humanities
Alderman Library        University of Virginia  Charlottesville, Virginia 22903
Phone: 804 924-6594     Fax: 804 982-2363       Email: [log in to unmask]