I agree with Ethan. Indexing EAD with SOLR is different than needing a presentation interface, which you still need and what everyone seems to continually complain about. That said my brief experience, along with some hearsay, is that SOLR has trouble with deeply hierarchical data. It is pretty easy to index at the archdesc level but once you get into the dsc all that inheritance business doesn’t fare so well.

 

But I’m no SOLR expert.

 

John

 

 

John P. Rees, MA, MLIS

Curator, Archives and Modern Manuscripts

History of Medicine Division, MSC 3819

National Library of Medicine

8600 Rockville Pike

Bethesda, MD 20894

 

 

 

From: Ethan Gruber [mailto:[log in to unmask]]
Sent: Thursday, April 15, 2010 1:46 PM
To: [log in to unmask]
Subject: Re: Indexing EAD using Solr

 

Hi Mark,

I've used Solr for several different applications of EAD, from traditional finding aids to metadata that is intensive and focused at the item level, such as describing museum artifacts, like coins.  I think the blacklight approach to displaying EAD with the application called "Raven" is far different than indexing an entire guide as a Solr document, and I'm not entirely convinced their method is scalable to a collection of thousands or tens of thousands of EAD files.

To address Lisa's statement about reviewing XTF vs. Solr, I'm not sure you can compare the two that way.  Solr isn't a mechanism for viewing EAD, though I think that the indexing of data into Solr gives one much more flexibility to develop a robust framework for searching/browsing documents than what Lucene in XTF allows.

Ethan Gruber

2010/4/15 Király Péter <[log in to unmask]>

Hi Mark,

I have done it once, for a not too sophisticated, but quite large EAD set, and
for Drupal as interface. Steps were taken:

1) created a flat XML from original EAD, conforming to Solr input format
important sub steps:
a) preserving parent-child content with record ID, and "parent" field (c01...c12 levels)
b) preserving full path with XPATH expressions (rootID/childID/grandchildID/.../currentDocID
c) handling dates to Solr format

2) load it into Solr
3) writing simple methods, which could handle
a) navigation accross hierarchy
b) searching dates (and other fields, but those are trivials)
c) showing full path

That was all I done.

Péter

----- Original Message ----- From: "Mark A. Matienzo" <[log in to unmask]>


To: <[log in to unmask]>

Sent: Thursday, April 15, 2010 5:32 PM


Subject: Indexing EAD using Solr

I know there has been some discussion related to this about making EAD
available as part of the discovery layer, but I'm interested in
getting a sense of which institutions are using Solr [0] to index EAD.
At this point, I'm more interested in discussing the different
indexing strategies from a technical standpoint rather than focusing
too much on the discovery layer. For what it's worth, this discussion
began [1] when some folks were talking about incorporating EAD into a
Solr index to be used by Blacklight [2], an open source discovery
layer.

If your institution is using Solr to index EAD, can you briefly
describe your indexing process? I would be interested in coordinating
future work, or potentially developing a set of recommendations/best
practices to share with the community.

[0] http://lucene.apache.org/solr
[1] http://groups.google.com/group/blacklight-development/browse_thread/thread/848bae32b11a8501
[2] http://projectblacklight.org/

Mark A. Matienzo
Digital Archivist, Manuscripts and Archives
Yale University Library