Print

Print


My unstated assumption was that you are indexing a single EAD as a single SOLR doc, not multiple SOLR docs. I guess it's faulty to assume that anyone with a SOLR-skilled programmer wouldn't also have the time/skills to bust up single EAD or TEI docs.

John


-----Original Message-----
From: Mark A. Matienzo [mailto:[log in to unmask]] 
Sent: Tuesday, April 20, 2010 1:29 PM
To: [log in to unmask]
Subject: Re: Indexing EAD using Solr

2010/4/16  Rees, John (NIH/NLM) [E] <[log in to unmask]>:
> That said my brief experience, along with some
> hearsay, is that SOLR has trouble with deeply hierarchical data. It is
> pretty easy to index at the archdesc level but once you get into the dsc all
> that inheritance business doesn’t fare so well.

I'm not sure if this is really something that's Solr-specific,
however. The inheritance within EAD is primarily conceptual and
thereby implicit. We don't really have a well-defined data model for
archival description that provides a reliable way to derive that
information from the hierarchy.

2010/4/16 Király Péter <[log in to unmask]>:
> Hi,
>
> I guess, that the main problem is not how to maintain hierarchy (in a
> previous email
> I describe techniques). The main problem is the hierarchy-based search,
> like: I
> would like to retrieve those records with title X, which are the decendants
> of
> records with author Y. The problem is, that in Solr there is no JOIN-like
> operator.

 While it's been a while since I looked at their implementation, I
believe New York University splits their EAD finding aids into
multiple documents, with one for the high-level information and the
rest for each component level. This information is then loaded
dynamically upon request in the interface. [0]

I'm also curious if anyone has tried an approach using XML Lucene
payloads in Solr. [1] It appears that University of Alberta was trying
this approach for providing page-level access to texts in TEI.

2010/04/19 Ethan Gruber <[log in to unmask]>:
> As an addendum to this conversation and to address Mark's concerns about
> indexing strategies/best practices, I think that there can be no standard
> approach to indexing EAD data.  What you index is dependent upon your user
> interface specifications.

I agree, but I think I was specifically trying to distinguish between
UI needs for the discovery system vs. UI needs for the presentation
system. I understand that the two might be fairly interconnected, but
I still se a pretty strong distinction between the two.

[0] http://dlib.nyu.edu/findingaids/search/?q=house
[1] https://issues.apache.org/jira/browse/SOLR-380

Mark A. Matienzo
Digital Archivist, Manuscripts and Archives
Yale University Library