My unstated assumption was that you are indexing a single EAD as a single SOLR doc, not multiple SOLR docs. I guess it's faulty to assume that anyone with a SOLR-skilled programmer wouldn't also have the time/skills to bust up single EAD or TEI docs. John -----Original Message----- From: Mark A. Matienzo [mailto:[log in to unmask]] Sent: Tuesday, April 20, 2010 1:29 PM To: [log in to unmask] Subject: Re: Indexing EAD using Solr 2010/4/16 Rees, John (NIH/NLM) [E] <[log in to unmask]>: > That said my brief experience, along with some > hearsay, is that SOLR has trouble with deeply hierarchical data. It is > pretty easy to index at the archdesc level but once you get into the dsc all > that inheritance business doesn’t fare so well. I'm not sure if this is really something that's Solr-specific, however. The inheritance within EAD is primarily conceptual and thereby implicit. We don't really have a well-defined data model for archival description that provides a reliable way to derive that information from the hierarchy. 2010/4/16 Király Péter <[log in to unmask]>: > Hi, > > I guess, that the main problem is not how to maintain hierarchy (in a > previous email > I describe techniques). The main problem is the hierarchy-based search, > like: I > would like to retrieve those records with title X, which are the decendants > of > records with author Y. The problem is, that in Solr there is no JOIN-like > operator. While it's been a while since I looked at their implementation, I believe New York University splits their EAD finding aids into multiple documents, with one for the high-level information and the rest for each component level. This information is then loaded dynamically upon request in the interface. [0] I'm also curious if anyone has tried an approach using XML Lucene payloads in Solr. [1] It appears that University of Alberta was trying this approach for providing page-level access to texts in TEI. 2010/04/19 Ethan Gruber <[log in to unmask]>: > As an addendum to this conversation and to address Mark's concerns about > indexing strategies/best practices, I think that there can be no standard > approach to indexing EAD data. What you index is dependent upon your user > interface specifications. I agree, but I think I was specifically trying to distinguish between UI needs for the discovery system vs. UI needs for the presentation system. I understand that the two might be fairly interconnected, but I still se a pretty strong distinction between the two. [0] http://dlib.nyu.edu/findingaids/search/?q=house [1] https://issues.apache.org/jira/browse/SOLR-380 Mark A. Matienzo Digital Archivist, Manuscripts and Archives Yale University Library