Michael - We went through our frenzy of scanning in 1993,
when we converted our legacy printed finding aids.
Though we were somewhat aware of the break-even point
for the return on our invested time and money (we used 95% accuracy as
the lower limit), we ventured bravely into the fray.
And with happy results. We did the work as a summer project, hired a student
to run the scanner and perform cleanup on the files.
Since we had some experience in writing macros,
we cobbled together a group of scripts to cleanup the obvious errors
(g=8, 9=comma and so on...), so we bumped ourselves up
another couple of percentage points on the accuracy scale.
The bigger problem was getting the finding aids back into the right format,
since all of our automated register preparation steps depend on
information being at specific tab settings to be manipulated correctly.
Again, this was done relatively easily with macros.
In the end, we were able to scan and convert over 1000 pages of finding aid
With the amount of time and effort (my developmental time, a student
working 30 hrs/wk for 10 weeks, the investment in a scanner), it was
certainly worth it.
In addition, we now know a bit about scanning and have applied this to
projects in the library.
A couple of years ago, we reconned a number of collections that had been
cataloged on cards. For this project, we chose to rekey the info.,
since the prospect of scanning cards presented a much lower accuracy rate.
This is mostly anecdotal, but I hope it helps.
Beinecke Rare Book and Manuscript Library
New Haven, CT 06520
At 08:21 AM 4/23/99 -0500, you wrote:
>I would be curious to hear the experiences of others on the list with
>scanning of existing finding aids. I have read several reports of very
>good experiences like that which Dan reports but have quite poor results
>ourselves (using various OCR software including Omnipage).
>Perhaps it's because so many of our inventories are older and were produced
>on manual and electric typewriters.
>95% accuracy rates sound good until you realize that this means that every
>fourth or fifth word has a typo. And the spell checker doesn't seem to help
>much with inventories that contain a lot of names.
>Head of Processing
>Minnesota Historical Society
>345 Kellogg Blvd West
>St. Paul MN 55102-1906
>[log in to unmask]
>**NOTE NEW AREA CODE EFFECTIVE JULY 12, 1998**
>> From: Dan Linke[SMTP:[log in to unmask]]
>> Sent: Wednesday, April 21, 1999 4:27 PM
>> To: Multiple recipients of list EAD
>> Subject: Re: HTML or ASCII?
>> You did not mention in what form you gave the finding aids to APEX. I am
>> assuming that you gave them paper copies which they scanned, OCRed, and
>> converted. If they are going to all that effort, they could surely give
>> an electronic version without the EAD encoding, and the cost cannot be
>> much greater since it's only a matter of one of the steps along the way
>> toward encoding. I would pursue that route before attempting to strip the
>> coding out manually at your end.
>> However, related to this and for future planning, we are scanning
>> paper-based finding aids with an HP scanner (cost of $700) with Caere's
>> Omnipage 8.0 (an OCR program, cost about $400, less if it's an upgrade
>> software included with the scanner; we paid only $100) with very good
>> results. It saves the file as both a Word file and then also converts to
>> HTML. (You could also save it as an ASCII file too, once it was in Word.)
>> Last summer we employed a student to do this nearly fulltime and she was
>> able to scan, OCR, and convert nearly 100 pages per day. This rate varied
>> according to each finding aid's "page density" but is based on 20 finding
>> aids totalling over 3500 pages which were completed in 37 working days. I
>> mention this because the cost of the student labor is significantly less
>> than APEX's hourly rate, I suspect, and if you provide them with an
>> electronic version rather than paper, you may stretch your grant dollars
>> that much more. Of course, you need to find a good student who is
>> Hope this helps.
>> David Delorenzo wrote:
>> > Colleagues--
>> > I need your advice on a problem I have encountered with our project to
>> > convert and encode our finding aids.
>> > None of our 3,000+ finding aids are available in electronic form. I have
>> > received a grant which I hope can kill several birds with one stone. My
>> > goals for the project are: 1) convert the finding aids into electronic
>> > form, 2) acquire an electronic text version (ASCII or something else
>> > that I can manipulate in a word processing software (MS WORD) and an
>> > HTML writer/editor (Netscape Composer)), and 3) acquire an EAD-encoded
>> > version.
>> > We have hired Apex (as we are an RLG member) to convert and encode the
>> > finding aids. We plan to send the EAD versions to Archival Resources.
>> > Because I want more flexibility for future uses of the finidng aids
>> > (whatever they later may be given advances in technology), I would like
>> > to maintain locally a text version (which for now could be manipulated
>> > using MS WORD). Because Archival Resources is available only
>> > fee-for-service, and I don't have the technical support necessary to
>> > maintain SGML documents, I am also planning on maintaining at our WEB
>> > site an HTML encoded version meeting our specifications for structure,
>> > etc.
>> > Here is the problem. Apex has provided me with a first batch of one
>> > hundred EAD encoded finding aids. I had hoped to be able to use the
>> > encoded versions in other ways by stripping them of the coding BUT alas,
>> > with most grand ideas, I have been unsuccessful! Of course, for more
>> > money (which I'd like to spend on other issues), I am sure Apex would be
>> > happy to resolve this matter for me. Before pursuing this option with
>> > the remaining 2,900 finding aids, however, I wanted to know if there
>> > was a "de-babble-izer" that I could purchase to magically remove the
>> > encoding.
>> > I am happy to pay the vendor for the deliverables I need but I wanted to
>> > check with you all first! I look forward to hearing from you.
>> > David
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > David de Lorenzo 201 West Monument Street
>> > Library Director Baltimore, MD 21201-4674
>> > Maryland Historical Society (410) 685-3750 Ext. 309
>> > Library of Maryland History FAX: (410) 385-2105
>> > http://www.mdhs.org
>> Dan Linke
>> Assistant Archivist for Technical Services
>> Seeley G. Mudd Manuscript Library
>> 65 Olden Street
>> Princeton, NJ 08544
>> 609-258-6345 (v) 609-258-3385 (fax)