I would be curious to hear the experiences of others on the list with
scanning of existing finding aids.   I have read several reports of very
good experiences like that which Dan reports but have quite poor results
ourselves (using various OCR software including Omnipage).

Perhaps it's because so many of our inventories are older and were produced
on manual and electric typewriters.

95% accuracy rates sound good until you realize that this means that every
fourth or fifth word has a typo.  And the spell checker doesn't seem to help
much with inventories that contain a lot of names.


Michael Fox
Head of Processing
Minnesota Historical Society
345 Kellogg Blvd West
St. Paul MN 55102-1906
phone: 651-296-1014
fax:  651-296-9961
[log in to unmask]

> ----------
> From:         Dan Linke[SMTP:[log in to unmask]]
> Sent:         Wednesday, April 21, 1999 4:27 PM
> To:   Multiple recipients of list EAD
> Subject:      Re: HTML or ASCII?
> David,
> You did not mention in what form you gave the finding aids to APEX.  I am
> assuming that you gave them paper copies which they scanned, OCRed, and
> converted.  If they are going to all that effort, they could surely give
> you
> an electronic version without the EAD encoding, and the cost cannot be
> that
> much greater since it's only a matter of one of the steps along the way
> toward encoding.  I would pursue that route before attempting to strip the
> coding out manually at your end.
> However, related to this and for future planning, we are scanning
> paper-based finding aids with an HP scanner (cost of $700) with Caere's
> Omnipage 8.0 (an OCR program, cost about $400, less if it's an upgrade
> from
> software included with the scanner; we paid only $100) with very good
> results.  It saves the file as both a Word file and then also converts to
> HTML. (You could also save it as an ASCII file too, once it was in Word.)
> Last summer we employed a student to do this nearly fulltime and she was
> able to scan, OCR, and convert nearly 100 pages per day.  This rate varied
> according to each finding aid's "page density" but is based on 20 finding
> aids totalling over 3500 pages which were completed in 37 working days.  I
> mention this because the cost of the student labor is significantly less
> than APEX's hourly rate, I suspect, and if you provide them with an
> electronic version rather than paper, you may stretch your grant dollars
> that much more.  Of course, you need to find a good student who is
> detail-oriented.
> Hope this helps.
> David Delorenzo wrote:
> > Colleagues--
> >
> > I need your advice on a problem I have encountered with our project to
> > convert and encode our finding aids.
> >
> > None of our 3,000+ finding aids are available in electronic form. I have
> > received a grant which I hope can kill several birds with one stone. My
> > goals for the project are: 1) convert the finding aids into electronic
> > form, 2) acquire an electronic text version (ASCII or something else
> > that I can manipulate in a word processing software (MS WORD) and an
> > HTML writer/editor (Netscape Composer)), and 3) acquire an EAD-encoded
> > version.
> >
> > We have hired Apex (as we are an RLG member) to convert and encode the
> > finding aids. We plan to send the EAD versions to Archival Resources.
> > Because I want more flexibility for future uses of the finidng aids
> > (whatever they later may be given advances in technology), I would like
> > to maintain locally a text version (which for now could be manipulated
> > using MS WORD). Because Archival Resources is available only
> > fee-for-service, and I don't have the technical support necessary to
> > maintain SGML documents, I am also planning on maintaining at our WEB
> > site an HTML encoded version meeting our specifications for structure,
> > etc.
> >
> > Here is the problem. Apex has provided me with a first batch of one
> > hundred EAD encoded finding aids. I had hoped to be able to use the
> > encoded versions in other ways by stripping them of the coding BUT alas,
> > with most grand ideas, I have been unsuccessful! Of course, for more
> > money (which I'd like to spend on other issues), I am sure Apex would be
> > happy to resolve this matter for me. Before pursuing this option with
> > the remaining 2,900 finding aids, however, I wanted to know if  there
> > was a  "de-babble-izer" that I could purchase to magically remove the
> > encoding.
> >
> > I am happy to pay the vendor for the deliverables I need but I wanted to
> > check with you all first! I look forward to hearing from you.
> >
> > David
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> > David de Lorenzo                       201 West Monument Street
> > Library Director                       Baltimore, MD 21201-4674
> > Maryland Historical Society            (410) 685-3750 Ext. 309
> > Library of Maryland History            FAX: (410) 385-2105
> >               
> --
> Dan Linke
> Assistant Archivist for Technical Services
> Seeley G. Mudd Manuscript Library
> 65 Olden Street
> Princeton, NJ  08544
> 609-258-6345 (v)   609-258-3385 (fax)