David:
David Delorenzo wrote:
> if there
> was a "de-babble-izer" that I could purchase to magically remove the
> encoding.
You're on Windows, right? If so
1. Download NoteTab Light from http:\\www.notetab.com (its free!)
2. add the following clip to any of the clipbooks you find when the
program is unzipped and running on your system
H="strip all tags"
:Start
^!Replace "<.*>" >> " " RASW
^!IfError End
^!GoTo Start
This may look cryptic now, but after you have notetab installed and you
have spent a few minutes with it, it will make sense. If you have
problems, reply off list.
This is the fastest way I know to strip tags on Windows. The <.*> thing
is known as a regular expression and tells Notetab to match any
character (.) any number of times (*) within and including a start tag
(<) and an end
tag (>).
This regular expression will not work if the tag is split by a
newline (E.g. <!DOCTYPE ead PUBLIC "-//Society of American
Archivists//DTD ead.dtd
(Encoded Archival Description (EAD) Version 1.0)//EN" "ead.dtd"[
]> will not work), but it will work the *vast* majority of your tags
(on everything but !DOCTYPE probably).
If it helps, I am working (slowly) on NoteTab extenstions to map EAD
tags to RTF encoding for easy entry into a format that can be later
manipulated in M$ Word (the idea was to use RTF as an intermediary to
PDF). Equally, EAD tags can be mapped to HTML tags in NoteTab using same
^!Replace... syntax. But to convert EAD to HTML I would recommend you
consider using XSL.
For translations from HTML to RTF consider Ishtar
http://www.cena.dgac.fr/~sagnier/info/formats/conversions/htm2rtf.htm
It does not handle tables too well, but it is a good first stop.
From RTF to HTML you are all set using "save-as-html" in Word
Regards,
Stephen
--
Stephen Yearl, Project Archivist
[log in to unmask]
*************************
Connecticut Historical Society
1 Elizabeth Street
Hartford, CT, 06105
*************************
http://www.chs.org
*************************
|