If you access to some good expertise in the Perl programming languague, you
might want to try looking at couple of packages that I have found useful
in this type of work (legacy finding aids / catalogue cards)
Parse::RecDescent - Generate Recursive-Descent Parsers
It allows it to specify a grammer using regular expressions to parse
text. It depends on to what extent there is an implied structure, or how
complex you want to make the regular expressions that attempt to
match and isolate elements. It is fairly robust and featured, but you have
to build your application yourself.
You can also couple that with some other Perl modules available, such as
SGML::Grove to actually take the extracted data and create an
SGML document instance or prefill an SGML document instance
that is used as a template.
The perl modules are located at www.perl.org in the CPAN section.
Another potential option is OmniMark (a commercial product), with a limited
version available. http://www.omnimark.com/
You might also want to take a look at a book called a Practical Guide to
SGML Filters by Norman E. Smith.
ISBN 1556225113. Its a bit dated in that the Perl section does not cover
some of the more
current and useful modules, but it does have examples using a number of
different software such as Omnimark.
> From: Hughey, Bill (CZR)[SMTP:[log in to unmask]]
> Reply To: Encoded Archival Description List
> Sent: Tuesday, February 16, 1999 7:58 AM
> To: Multiple recipients of list EAD
> Subject: File lists
> Hello all,
> Here at the Archives of Ontario, we are currently investigating options
> for automating our file lists. EAD is an option that we are considering
> but we have several concerns. We have approximately 3,000 lists in a
> variety of sizes and formats (e.g. tables, spaced entries, paper
> copies). Moreover, there is no defined structure for these tables.
> Although typical file lists contain four "fields" (reference code,
> title, dates, and container), many have extra "fields" (such as a
> creator code or a subject field). Does anybody have experience with
> retroactive conversion of such diverse file/container lists? We are
> trying to determine the resource issues involved in such retrofitting.
> Also, is it possible to include header/footer information with each
> displayed page in EAD? Given that our file lists are often quite large,
> we feel that this is an important element to include. Any comments,
> anecdotes, or advice would be appreciated.
> Bill Hughey
> Archivist, Health/Social Portfolio
> Archives of Ontario
> (416) 327-1543