Similar to Michael’s approach at MNHS, at NLM we’re able to point our enterprise site search tool, Vivisimo, to our source xml and return a html view. We did the same when we used to just serve static html, only now we point to our DLXS-hosted content that sits on a different server. We specifically do not index our ILS and MedlinePlus hits can often overwhelm search results. Vivisimo does natural “clustering” depending on the number of hits within a document and some other algorithms. Vivisimo can index any number of document types including the Microsoft suite, pdf, etc. I’m experimenting with using it to index external data sources as well.
Being an enterprise solution it is a costly product, however. And, like Michael describes, we believe in finding the right tool for the job at hand within the limits of our corporate infrastructure/politics. DLXS provides content-specific functionality, Vivisimo provides a general bucket discovery, MARC for the ILS and OCLC, Marc-to-NLM XML for our external data subscribers like PubMed, etc. As we explore other “next generation” discovery alternatives like Blacklight, Primo, etc., we’ll evaluate them against existing functional requirements and our user needs, which can be quite diverse. Most of our public customers are not looking for ILS content.
John P. Rees, MA, MLIS
Curator, Archives and Modern Manuscripts
History of Medicine Division, MSC 3819
National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894
The Minnesota Historical Society has taken a different approach to presenting data from a variety of data stores and in a variety of data formats through a single, integrated search.
If you go to our website, www.mnhs.org, you will find a standard Search box in the upper right hand corner. If you enter a term, say something Minnesotan like the name Johnson, the resulting page will provide a list of results from multiple sources: our Web site (html pages largely), our library catalog (an Aleph OPAC); finding aids (in EAD XML of course), our photo and art collections (relational databases with linked images), the text of the Minnesota History journal (searchable PDFs), and Minnesota Place Names (another relational database). If you selected the Search for People option, several other sources would be available: the state's birth and death records, state censuses, and a veterans' grave index. The museum catalog will be added next.
The Society also hosts a multi-institution, multi-state portal called the Great Rivers Network. Its prototype web site, greatriversnetwork.org, adds information from other sources and repositories including several ContentDM photo databases.
As I said, we have taken a different approach. We have made several attempts at the one-tool-serves-all approach beginning with an NEH project 17 years ago to enter museum metadata in MARC syntax into our OPAC. In our most recent initiative, we looked at several options and tried one Pentagon-strength search engine. The challenges to this approach are significant. What we discovered was that one either has to put up substantial funds to induce the vendor to build links to data sources outside the mainstream of their use base, twist your data into a pretzel to fit what's there, or wait for the user base to catch up to where we wanted to be. That’s what seems to be happening in the OPAC world. We were not interested in trying to stuff our finding aids into Dublin Core or even perhaps in putting all our eggs into one basket.
Instead we have looked to use best of breed tools for each application and web services to extract what need from each. Of course, we would like to minimize the number of tools we have to support and want to be as open source as possible or at least be in an environment where there is a robust community of institutions with similar needs around a common application. For us, that will be something more specialized than the world of library systems, even academic library systems.
For the moment, what we are looking for is a better toolkit for ingesting interesting but unique data from some of our partners: in spreadsheets, word processing documents, PC databases like FileMaker and Access, as well as special purpose tools like Past Perfect. Too much hands on massaging for the moment. Any suggestions?
Thanks for pointing that amazing resource out, as it’s been a favorite of mine ever since I first saw it at the Computers in Libraries conference in 2008 (and I think that the new updates are even better!). But it isn’t what I had in mind, since it is only indexing a fraction of the EAD record (i.e., it’s not including, for just one example, <unittitle>’s in the <dsc>).
Anyhow, I’m not suggesting that the entire EAD should be included in the OPAC, I’m just trying to figure out who is doing so right now. In the past, only MARC records were able to uploaded into an ILS, but this is obviously changing now with the proliferation of “discovery layers” (and in those two examples I provided earlier, both of those OPACs permit searching throughout the entire EAD, even though neither of those resources are the primary gateway for their EAD records).
One final example that I’ll point out, even though it doesn’t search EAD records, is the Hathi Trust Digital Library:
Which is a great illustration, since right now they have 2 ways to search their resources:
1) “About” their items searches the MARC records
2) “Within” their items searches the full-text from the OCR
In this case, though, entire EAD records (not just their MARC derivatives) would still fall into the “about” camp. Nevertheless, they are rarely included in their entirety in the OPAC (but it’s now possible to do just that, with a bit of extra work, of course).
I think the newly re-engineered Smithsonian’s Collections Search Center does this, but perhaps I’m not fully understanding the question. The Search Center contains MARC records, and links to EAD finding aids, and other resources. I’m pretty sure that Ching-Hsien created a metadata model and an interface or interfaces to harvest data from multiple datasets, from all sources – museums, libraries, and archives. I’m probably not explaining it very well.
Here’s the link http://collections.si.edu/search/
Ching-Hsien would be happy to answer any questions.
Barbara D. Aikens
Chief, Collections Processing
Archives of American Art, Smithsonian Institution
Archives of American Art, Smithsonian Institution
PO Box 37012
Victor Bldg., Suite 2200, MRC 937
Washington, DC 20013-7012
I’m curious if anyone on the list has experience with adding their EAD documents into a larger discovery system?
Here are two examples of what I mean:
· Triangle Research Library Network now indexes (and displays) entire EAD documents.
Example (in which I’ve restricted my results to “archival materials” and entered “ammons” as my keyword):
· University of Chicago library’s implementation of AquaBrowser seems to index entire EAD documents.
Example (in which I’ve searched for “American Automobile Brief History", quotes included, and where the first 3 results returned should be for archival finding aids):
So, this leads me to three questions in particular:
1. Can you point me to any other online examples of “discovery tools” that are ingesting entire EAD documents? Summon, Encore, Primo, Blacklight, etc.??? (but, again, I’m not asking about OPACS that only search a MARC surrogate of the EAD)
2. For those of you that are including the entire EAD in your library’s discovery tool, did you already have surrogate MARC records for those collections in your catalog? If so, how are you dealing with those now that you’re adding the EAD?
3. What do you think of whole retrieval experience (advanced search options, facets, incorporation into the relevancy algorithm, etc.)?
Thanks in advance for any and all advice and/or other examples that might be out there,