I'm not sure what version of Omnipage Michael used, but we received 4.0 with our
scanner and it was terrible. However, based on hope and a PC magazine review,
we upgraded to 8.0 and find it very accurate. Our finding aids were typed in
the late 60s and early 70s using a Courier font. Most are not very dense either,
i.e. there's at least one blank line between each entry, which I think helps the
process, but that's only based on a hunch. One handy feature of 8.0 is that the
spell checker shows you both the OCR's interpretation as well as the picture of
the text, so if it flags the name Kennan and shows you that it thinks it's
Keman, based on the image, you know what it is looking at and can change the
spelling as well as add it to the dictionary. Given the relatively narrow range
of names that we deal with (only 20th century for the most part), after a few
finding aids you can fly through these. I know for certain that we could not
have rekeyed these as quickly as we scanned them, and even rekeying would have
required spellchecking of proper names. Also, Omnipage 9.0 just debuted, but it
requires more RAM than we have on our current multimedia web PC, so we have not
traded up to that yet.
We are also using a decent scanner, an HP Scanjet 6100C which is middle of the
road in terms of price and quality, but we have found it capable of handling
everything except glass plate negatives.
Fox, Michael wrote:
> I would be curious to hear the experiences of others on the list with
> scanning of existing finding aids. I have read several reports of very
> good experiences like that which Dan reports but have quite poor results
> ourselves (using various OCR software including Omnipage).
>
> Perhaps it's because so many of our inventories are older and were produced
> on manual and electric typewriters.
>
> 95% accuracy rates sound good until you realize that this means that every
> fourth or fifth word has a typo. And the spell checker doesn't seem to help
> much with inventories that contain a lot of names.
>
> Michael
>
> Michael Fox
> Head of Processing
> Minnesota Historical Society
> 345 Kellogg Blvd West
> St. Paul MN 55102-1906
> phone: 651-296-1014
> fax: 651-296-9961
> [log in to unmask]
> **NOTE NEW AREA CODE EFFECTIVE JULY 12, 1998**
>
> > ----------
> > From: Dan Linke[SMTP:[log in to unmask]]
> > Sent: Wednesday, April 21, 1999 4:27 PM
> > To: Multiple recipients of list EAD
> > Subject: Re: HTML or ASCII?
> >
> > David,
> >
> > You did not mention in what form you gave the finding aids to APEX. I am
> > assuming that you gave them paper copies which they scanned, OCRed, and
> > converted. If they are going to all that effort, they could surely give
> > you
> > an electronic version without the EAD encoding, and the cost cannot be
> > that
> > much greater since it's only a matter of one of the steps along the way
> > toward encoding. I would pursue that route before attempting to strip the
> > coding out manually at your end.
> >
> > However, related to this and for future planning, we are scanning
> > paper-based finding aids with an HP scanner (cost of $700) with Caere's
> > Omnipage 8.0 (an OCR program, cost about $400, less if it's an upgrade
> > from
> > software included with the scanner; we paid only $100) with very good
> > results. It saves the file as both a Word file and then also converts to
> > HTML. (You could also save it as an ASCII file too, once it was in Word.)
> > Last summer we employed a student to do this nearly fulltime and she was
> > able to scan, OCR, and convert nearly 100 pages per day. This rate varied
> > according to each finding aid's "page density" but is based on 20 finding
> > aids totalling over 3500 pages which were completed in 37 working days. I
> > mention this because the cost of the student labor is significantly less
> > than APEX's hourly rate, I suspect, and if you provide them with an
> > electronic version rather than paper, you may stretch your grant dollars
> > that much more. Of course, you need to find a good student who is
> > detail-oriented.
> >
> > Hope this helps.
> >
> > David Delorenzo wrote:
> >
> > > Colleagues--
> > >
> > > I need your advice on a problem I have encountered with our project to
> > > convert and encode our finding aids.
> > >
> > > None of our 3,000+ finding aids are available in electronic form. I have
> > > received a grant which I hope can kill several birds with one stone. My
> > > goals for the project are: 1) convert the finding aids into electronic
> > > form, 2) acquire an electronic text version (ASCII or something else
> > > that I can manipulate in a word processing software (MS WORD) and an
> > > HTML writer/editor (Netscape Composer)), and 3) acquire an EAD-encoded
> > > version.
> > >
> > > We have hired Apex (as we are an RLG member) to convert and encode the
> > > finding aids. We plan to send the EAD versions to Archival Resources.
> > > Because I want more flexibility for future uses of the finidng aids
> > > (whatever they later may be given advances in technology), I would like
> > > to maintain locally a text version (which for now could be manipulated
> > > using MS WORD). Because Archival Resources is available only
> > > fee-for-service, and I don't have the technical support necessary to
> > > maintain SGML documents, I am also planning on maintaining at our WEB
> > > site an HTML encoded version meeting our specifications for structure,
> > > etc.
> > >
> > > Here is the problem. Apex has provided me with a first batch of one
> > > hundred EAD encoded finding aids. I had hoped to be able to use the
> > > encoded versions in other ways by stripping them of the coding BUT alas,
> > > with most grand ideas, I have been unsuccessful! Of course, for more
> > > money (which I'd like to spend on other issues), I am sure Apex would be
> > > happy to resolve this matter for me. Before pursuing this option with
> > > the remaining 2,900 finding aids, however, I wanted to know if there
> > > was a "de-babble-izer" that I could purchase to magically remove the
> > > encoding.
> > >
> > > I am happy to pay the vendor for the deliverables I need but I wanted to
> > > check with you all first! I look forward to hearing from you.
> > >
> > > David
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> > > David de Lorenzo 201 West Monument Street
> > > Library Director Baltimore, MD 21201-4674
> > > Maryland Historical Society (410) 685-3750 Ext. 309
> > > Library of Maryland History FAX: (410) 385-2105
> > > http://www.mdhs.org
> >
> > --
> > Dan Linke
> > Assistant Archivist for Technical Services
> > Seeley G. Mudd Manuscript Library
> > 65 Olden Street
> > Princeton, NJ 08544
> > 609-258-6345 (v) 609-258-3385 (fax)
> > http://www.princeton.edu/mudd
> >
--
Dan Linke
Assistant Archivist for Technical Services
Seeley G. Mudd Manuscript Library
65 Olden Street
Princeton, NJ 08544
609-258-6345 (v) 609-258-3385 (fax)
http://www.princeton.edu/mudd
|