Similar to Michael’s
approach at MNHS, at NLM we’re able to point our enterprise site search
tool, Vivisimo, to our source xml and return a html view. We did the same when
we used to just serve static html, only now we point to our DLXS-hosted content
that sits on a different server. We specifically do not index our ILS and MedlinePlus
hits can often overwhelm search results. Vivisimo does natural “clustering”
depending on the number of hits within a document and some other algorithms.
Vivisimo can index any number of document types including the Microsoft suite,
pdf, etc. I’m experimenting with using it to index external data sources
as well.
Being an enterprise solution it
is a costly product, however. And, like Michael describes, we believe in
finding the right tool for the job at hand within the limits of our corporate
infrastructure/politics. DLXS provides content-specific functionality, Vivisimo
provides a general bucket discovery, MARC for the ILS and OCLC, Marc-to-NLM XML
for our external data subscribers like PubMed, etc. As we explore other “next
generation” discovery alternatives like Blacklight, Primo, etc., we’ll
evaluate them against existing functional requirements and our user needs,
which can be quite diverse. Most of our public customers are not looking for ILS
content.
John
John
P. Rees, MA, MLIS
Curator,
Archives and Modern Manuscripts
History
of Medicine Division, MSC 3819
National
Library of Medicine
8600
Rockville Pike
Bethesda,
MD 20894
From: Fox, Michael
[mailto:[log in to unmask]]
Sent: Monday, December 28, 2009 11:49 AM
To: [log in to unmask]
Subject: Re: Adding EAD to the 'layer of discovery'?
Hello,
The Minnesota Historical Society
has taken a different approach to presenting data from a variety of data stores
and in a variety of data formats through a single, integrated
search.
If you go to our website, www.mnhs.org, you will find a standard Search
box in the upper right hand corner. If you enter a term, say something
Minnesotan like the name Johnson, the resulting page will provide a list of
results from multiple sources: our Web site (html pages largely), our
library catalog (an Aleph OPAC); finding aids (in EAD XML of course), our photo
and art collections (relational databases with linked images), the text
of the Minnesota History journal (searchable PDFs), and Minnesota
Place Names (another relational database). If you
selected the Search for People option, several other sources would be
available: the state's birth and death records, state censuses, and a
veterans' grave index. The museum catalog will be added next.
The Society also hosts a
multi-institution, multi-state portal called the Great Rivers Network.
Its prototype web site, greatriversnetwork.org, adds information from other
sources and repositories including several ContentDM photo databases.
As I said, we have taken a
different approach. We have made several attempts at the
one-tool-serves-all approach beginning with an NEH project 17 years ago to
enter museum metadata in MARC syntax into our OPAC. In our
most recent initiative, we looked at several options and tried one
Pentagon-strength search engine. The challenges to this approach
are significant. What we discovered was that one either has to put
up substantial funds to induce the vendor to build links to data sources
outside the mainstream of their use base, twist your data into a pretzel to fit
what's there, or wait for the user base to catch up to where we wanted to
be. That’s what seems to be happening in the OPAC world.
We were not interested in trying to stuff our finding aids into Dublin
Core or even perhaps in putting all our eggs into one basket.
Instead we have looked to use
best of breed tools for each application and web services to extract what need
from each. Of course, we would like to minimize the number of tools we
have to support and want to be as open source as possible or at least be in an
environment where there is a robust community of institutions with similar
needs around a common application. For us, that will be something more
specialized than the world of library systems, even academic library systems.
For the moment, what we are
looking for is a better toolkit for ingesting interesting but unique data from
some of our partners: in spreadsheets, word processing documents, PC databases
like FileMaker and Access, as well as special purpose tools like Past
Perfect. Too much hands on massaging for the moment. Any
suggestions?
Michael Fox
From: Encoded Archival
Description List [mailto:[log in to unmask]] On Behalf Of Custer, Mark
Sent: Monday, December 28, 2009 8:13 AM
To: [log in to unmask]
Subject: Re: Adding EAD to the 'layer of discovery'?
Thanks for pointing that amazing
resource out, as it’s been a favorite of mine ever since I first saw it at
the Computers in Libraries conference in 2008 (and I think that the new updates
are even better!). But it isn’t what I had in mind, since it is
only indexing a fraction of the EAD record (i.e., it’s not including, for
just one example, <unittitle>’s in the <dsc>).
Anyhow, I’m not suggesting
that the entire EAD should be included in the OPAC, I’m just trying to
figure out who is doing so right now. In the past, only MARC
records were able to uploaded into an ILS, but this is obviously changing now
with the proliferation of “discovery layers” (and in those two
examples I provided earlier, both of those OPACs permit searching throughout
the entire EAD, even though neither of those resources are the primary gateway
for their EAD records).
One final example that
I’ll point out, even though it doesn’t search EAD records, is the
Hathi Trust Digital Library:
http://catalog.hathitrust.org/
Which is a great illustration,
since right now they have 2 ways to search their resources:
1) “About” their
items searches the MARC records
2) “Within” their
items searches the full-text from the OCR
In this case, though, entire EAD
records (not just their MARC derivatives) would still fall into the
“about” camp. Nevertheless, they are rarely included in their
entirety in the OPAC (but it’s now possible to do just that, with a bit
of extra work, of course).
Mark
From: Encoded Archival
Description List [mailto:[log in to unmask]] On Behalf Of Aikens, Barbara
Sent: Wednesday, December 23, 2009 11:29 AM
To: [log in to unmask]
Subject: Re: Adding EAD to the 'layer of discovery'?
I think the newly re-engineered
Smithsonian’s Collections Search Center does this, but perhaps I’m
not fully understanding the question. The Search Center contains
MARC records, and links to EAD finding aids, and other resources.
I’m pretty sure that Ching-Hsien created a metadata model and an
interface or interfaces to harvest data from multiple datasets, from all
sources – museums, libraries, and archives. I’m
probably not explaining it very well.
Here’s the link http://collections.si.edu/search/
Ching-Hsien would be happy to
answer any questions.
Happy Holidays!
Barbara D. Aikens
Chief,
Collections Processing
Archives
of American Art, Smithsonian Institution
Ph: 202-633-7941
Mailing Address
Archives
of American Art, Smithsonian Institution
PO
Box 37012
Victor
Bldg., Suite 2200, MRC 937
Washington,
DC 20013-7012
From: Encoded Archival
Description List [mailto:[log in to unmask]] On Behalf Of Custer, Mark
Sent: Tuesday, December 22, 2009 3:23 PM
To: [log in to unmask]
Subject: Adding EAD to the 'layer of discovery'?
I’m curious if anyone on the list has experience with
adding their EAD documents into a larger discovery system?
Here are two examples of what I mean:
·
Triangle Research Library Network now indexes (and
displays) entire EAD documents.
Example (in which I’ve restricted my results to
“archival materials” and entered “ammons” as my
keyword):
http://search.trln.org/search?Nty=1&Ntk=Keyword&Ntt=ammons&N=200092
·
University of Chicago library’s
implementation of AquaBrowser seems to index entire EAD documents.
Example (in which I’ve searched for
“American Automobile Brief History", quotes included, and where the
first 3 results returned should be for archival finding aids):
http://lens.lib.uchicago.edu/?q=%22american%20automobile%20brief%20history%22
So, this leads me to three questions in particular:
1.
Can you point me to any other online examples of
“discovery tools” that are ingesting entire EAD documents?
Summon, Encore, Primo, Blacklight, etc.??? (but, again, I’m not asking
about OPACS that only search a MARC surrogate of the EAD)
2.
For those of you that are including the entire EAD in
your library’s discovery tool, did you already have surrogate MARC
records for those collections in your catalog? If so, how are you dealing
with those now that you’re adding the EAD?
3.
What do you think of whole retrieval experience
(advanced search options, facets, incorporation into the relevancy algorithm,
etc.)?
Thanks in advance for any and all advice and/or other
examples that might be out there,
Mark Custer