Over the last year, during the preparation of instances for our
OpenText server, I had to create a short list of
"Files that refuse to index" - suffice it to say that
this was also known as "The Headache List."
I parsed, examined, and compared the files on this list,
but was never able to figure out why OpenText refused
to index them...until now.
As serendipity has it, I stumbled across an important
error that will pass most parsers, but will trip up OpenText.
In marking up an EAD instance, there often are very long character strings
that must be broken by line wrap, or soft return, depending on the software
you are using.
Thus, it is not uncommon to have tags that wrap from one line to the next,
such as the <extref> tag in the following example:
---------------------------
<did><physdesc><extent>EXTENT<lb>
Total Boxes: 8<lb>
Other Storage Formats: oversize<lb>
Linear Feet: 6.0</extent></physdesc></did><note><p><extref
ext.ptr="http://www.library.yale.edu/beinecke/manuscript/copyrite.htm">
Copyright © 1992 by the Yale University Library.</extref></p>
<p>
-----------------------------
However, it seems that OpenText will allow some tags to be broken, but it
has a problem with other *crucial* tags. I discovered that all of the files
on my "refuse to index" list shared the same characteristic, which was that
the <archdesc> tag was broken between two lines as in the following example:
--------------------------
</titlepage></frontmatter><findaid><archdesc
level="collection">
-------------------------
It is important that this tag be complete on one line, or OpenText will
throw out the file.
I haven't discovered any other specific tags that are as delicate, but I
suspect that any
high-level tag might cause a similar problem.
I hope this helps anyone working with EAD and OpenText.
You might not need this information today, but do store it away...
Timothy Young
Archivist
Beinecke Rare Book and Manuscript Library
Yale University
New Haven, CT 06520
(203) 432-8131
p.s. OK, Not *all* of my files on the "refuse to index" list were
completely fixed by this procedure. I still have *one* that refuses to behave,
despite the <archdesc> fix, but that's another mystery to be solved.
|