LISTSERV mailing list manager LISTSERV 16.0

Help for EAD Archives


EAD Archives

EAD Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

EAD Home

EAD Home

EAD  August 2001

EAD August 2001

Subject:

Re: html to ead

From:

Joo Hang Cha <[log in to unmask]>

Reply-To:

Encoded Archival Description List <[log in to unmask]>

Date:

Wed, 1 Aug 2001 15:10:09 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (37 lines)

Hi Kris,

> I am replying to your query of the list because I do't have an answer to
> your question. I would be very interested in an answer however, as we
> have about 600 finding aids here in HTML that need to be converted to
> EAD at some point. We are far from starting on this project -- it will
> be part of my new job here, a job which I wont't start until Spring
> 2002. But if you find a solution or some conversion software I would
> appreciate if you could let me know!
    I do not know such software, but I will give you some clues.

Before getting involved with EAD at UBC (which is not my main job function
to begin with), I've done quite a bit of work with XML technologies. One of
the learning projects I've build was a COM component that used a set of
regular expressions and W3C's HTMLTidy software to "clean up" messy HTML
into a well-formed XML.

Since XML is very picky about well-formedness, the first challenge for such
conversion would be the HTML tidying process. Just to give you a figure, my
tests show that my component successfully parsed converted XML/HTML
documents about 90% of the time.

The second process is even more challenging. The program will have to
somehow "study" the structure of the HTML document (which serves almost no
metadata information) and match them with appropriate EAD tags.

So it looks like this process will even require some form of Artificial
Intelligence. UNLESS, of course, all your 600 HTML-based EAD documents
follow the SAME structure (i.e. the third <p> always corresponds to
<admininfo>, etc.), which I highly doubt it does.

I can think of ways to build an application that could help you with the
conversion process, but it looks to me like you will have to manually
convert most of them.

Good luck.

Top of Message | Previous Page | Permalink

Advanced Options


Options

Error during command authentication.

Error - unable to initiate communication with LISTSERV (errno=111). The server is probably not started.

Log In

Log In

Get Password

Get Password


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager