I have had some success with the indexing of EAD files in VUFind/Solr. From my blog posting complete with source code:
* ead-harvest.pl – Copies (mirrors) remote XML files locally
* ead-validate.pl – Makes sure the mirrored XML files are
well-formed, conform to the EAD DTD, and include an eadid url
attribute (done with a stupid stylesheet called geturl.xsl)
* ead-transform.pl – Makes sure each EAD container-level element
includes a unitid with a unique id attribute, saves the result to
a local cache, and transforms these same files into HTML. The
first process is done with a stylesheet called addunitid.xsl. The
second process is done with another stylesheet called
ead2html.xsl.
* ead-index.pl – Indexes all the cached/transformed EAD files by
parsing out container-level elements, creating an XML stream of
records of my own design, parsing the result, and passing each
record on to Solr. The heart of this script is a fourth
stylesheet — ead2solr.xsl
http://bit.ly/cIu0lG
The process is not 100% complete, but I thought I'd share.
Fun in cultural heritage institutions!
--
Eric Lease Morgan
University of Notre Dame
|