You could try saving the document, in MSWord, as "Plain text." Choose a
NEW name for the file. MSWord may ask you to pick an encoding. You could
lose any non-Latin1 characters (in fact, you could end up with some weird
characters anyway), but let's hope you get most of the text cleanly.
These may not be Word files, but are perhaps older WordPerfect, XYWrite, or
Wang files which Word is managing to open.
As a last resort -- print out and OCR.
At 05:22 PM 12/3/2007 -0500, Michele R Combs wrote:
>(apologies for cross-posting)
>I have encountered several inventories which appear to be in Word
>format, but whenever I try to get them out of word format (select all
>and copy it to a text file, for example. or save as .txt), all the text
>turns into little tiny squares. In Word, it looks like some flavor of
>Courier font; in the font box it displays "Voyager" which is not a font
>I recognize (there IS a voyager font, see
>http://www.urbanfonts.com/fonts/Futurex_Voyager.htm, but it's very Star
>Trek-y and this doesn't look anything like it). If in the Word doc I
>select all and change it to Times Roman or similar, it also changes all
>the text to little boxes. I can save it as rtf with no problem (but it
>doesn't *solve* the problem either). I have several of these
>inventories (some quite long) that need conversion to EAD, and I'd like
>very much to get this data out in a plain text format. They *may* have
>come from our years-ago system which was a Wang.
>This is a new one on me, and I thought I'd seen everything. Anyone have
>thoughts on this?
>Librarian for Manuscripts and Archives Processing.
>Special Collections Research Center.
>Syracuse University Library.
>222 Waverly Avenue.
>Syracuse, NY 13244
Collection Services Archivist
Harvard University Archives
Cambridge, MA 02138
voice: (617) 495-2461
fax: (617) 495-8011
email: [log in to unmask]