Print

Print


I knew I had saved an email from Ching-Hsien about the Collections Search Center indexing:

The index is Lucene/Solr open source product, we did the customization and interface development ourselves.


Ching-hsien Wang,  Manager
Library and Archives System Support Branch
Office of Chief Information Officer
Smithsonian Institution
202-633-5581(office)  202-312-2874(fax)
[log in to unmask]<mailto:[log in to unmask]>
Visit us online: www.siris.si.edu<http://www.siris.si.edu/>


Barbara D. Aikens

Chief, Collections Processing
Archives of American Art, Smithsonian Institution
Ph: 202-633-7941
email:  [log in to unmask]<mailto:[log in to unmask]>

Mailing Address
Archives of American Art, Smithsonian Institution
PO Box 37012
Victor Bldg., Suite 2200, MRC 937
Washington, DC  20013-7012



From: Encoded Archival Description List [mailto:[log in to unmask]] On Behalf Of Junte Zhang
Sent: Monday, December 28, 2009 12:55 PM
To: [log in to unmask]
Subject: Re: Adding EAD to the 'layer of discovery'?

Hi Mark,

Very interesting discussion. I think a probable reason why the inventory in <DSC> does not get indexed is because the elements here are usually not very descriptive and not worth indexing. Another reason could be that the conversion to EAD has not been complete. I am just wondering why bother to use EAD if the <DSC> is not used for retrieval? What's then the use of EAD or the purpose of Archival Description and ISAD(G)? How can we make it work then?

I think the key term here is XML (Information) Retrieval. There are many systems out there that selectively index chunks of text by assigning it to an XML element name. A notable example is the Online Archive of California system. They use for index and retrieval Lucene SOLR, which is not a real XML Retrieval system imho, but it indexes EAD in facets and does the job quite well, even though the technology behind it is not so novel anymore.
I developed a system that indexes all EAD tags, and anything can be reconstructed from the index and anything can be retrieved. I can supply you with a link to my prototype, but functional system off-list, a non-technical documentation of this system was published in the DigCcurr 2009 proceedings.
Best,
   junte

On 28 dec 2009, at 18:16, "Custer, Mark" <[log in to unmask]<mailto:[log in to unmask]>> wrote:
As a follow up, it seems that the most excellent Collections Search Center is indexing more than I had originally given it credit for.

I’m quite certain that it’s also indexing information in the <bioghist> and <scopecontent> sections as well (but I still don’t think it’s indexing anything in the <dsc>).  Thanks again for pointing this example out.

Here’s the example that I tested:

http://collections.si.edu/search/results.jsp?q=Florence+Knoll+Bassett

If you “expand” that section for the first result, you’ll also see that the metadata contains “note snippet(s)”, which is something that missed before!  So, changing up the query, you can see that I can still find this collection by searching on a term that only appears in the bioghist:

http://collections.si.edu/search/results.jsp?q=Cranbook+Academy+of+Art

And, finally, to prove that the <dsc> is not indexed, this query won’t return the same result (even though it appears in the finding aid in Series 3, Box 2, Folder 1):
http://collections.si.edu/search/results.jsp?q=Greek+islands&fq=online_media_type:%22Finding+aids%22

So, my question then becomes:
Was the <dsc> considered to be included and to be highlighted as “note snippet(s)” as well, or was this section left out due to the amount that it might’ve increased the index size???



Mark



From: Custer, Mark
Sent: Monday, December 28, 2009 9:13 AM
To: 'Encoded Archival Description List'
Subject: RE: Adding EAD to the 'layer of discovery'?

Thanks for pointing that amazing resource out, as it’s been a favorite of mine ever since I first saw it at the Computers in Libraries conference in 2008 (and I think that the new updates are even better!).  But it isn’t what I had in mind, since it is only indexing a fraction of the EAD record (i.e., it’s not including, for just one example, <unittitle>’s in the <dsc>).

Anyhow, I’m not suggesting that the entire EAD should be included in the OPAC, I’m just trying to figure out who is doing so right now.   In the past, only MARC records were able to uploaded into an ILS, but this is obviously changing now with the proliferation of “discovery layers” (and in those two examples I provided earlier, both of those OPACs permit searching throughout the entire EAD, even though neither of those resources are the primary gateway for their EAD records).

One final example that I’ll point out, even though it doesn’t search EAD records, is the Hathi Trust Digital Library:
http://catalog.hathitrust.org/

Which is a great illustration, since right now they have 2 ways to search their resources:

1) “About” their items searches the MARC records
2) “Within” their items searches the full-text from the OCR

In this case, though, entire EAD records (not just their MARC derivatives) would still fall into the “about” camp.  Nevertheless, they are rarely included in their entirety in the OPAC (but it’s now possible to do just that, with a bit of extra work, of course).


Mark


From: Encoded Archival Description List [mailto:[log in to unmask]] On Behalf Of Aikens, Barbara
Sent: Wednesday, December 23, 2009 11:29 AM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: Adding EAD to the 'layer of discovery'?

I think the newly re-engineered Smithsonian’s Collections Search Center does this, but perhaps I’m not fully understanding the question.   The Search Center contains MARC records, and links to EAD finding aids, and other resources.  I’m pretty sure that Ching-Hsien created a metadata model and an interface or interfaces to harvest data from multiple datasets, from all sources – museums, libraries, and archives.   I’m probably not explaining it very well.

Here’s the link http://collections.si.edu/search/

Ching-Hsien would be happy to answer any questions.

Happy Holidays!

Barbara D. Aikens

Chief, Collections Processing
Archives of American Art, Smithsonian Institution
Ph: 202-633-7941
email:  [log in to unmask]<mailto:[log in to unmask]>

Mailing Address
Archives of American Art, Smithsonian Institution
PO Box 37012
Victor Bldg., Suite 2200, MRC 937
Washington, DC  20013-7012



From: Encoded Archival Description List [mailto:[log in to unmask]] On Behalf Of Custer, Mark
Sent: Tuesday, December 22, 2009 3:23 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Adding EAD to the 'layer of discovery'?

I’m curious if anyone on the list has experience with adding their EAD documents into a larger discovery system?

Here are two examples of what  I mean:


•         Triangle Research Library Network now indexes (and displays) entire EAD documents.

Example (in which I’ve restricted my results to “archival materials” and entered “ammons” as my keyword):

http://search.trln.org/search?Nty=1&Ntk=Keyword&Ntt=ammons&N=200092


•         University of Chicago library’s implementation of AquaBrowser seems to index entire EAD documents.

Example (in which I’ve searched for “American Automobile Brief History", quotes included, and where the first 3 results returned should be for archival finding aids):
http://lens.lib.uchicago.edu/?q=%22american%20automobile%20brief%20history%22

So, this leads me to three questions in particular:


1.       Can you point me to any other online examples of “discovery tools” that are ingesting entire EAD documents?  Summon, Encore, Primo, Blacklight, etc.??? (but, again, I’m not asking about OPACS that only search a MARC surrogate of the EAD)



2.       For those of you that are including the entire EAD in your library’s discovery tool, did you already have surrogate MARC records for those collections in your catalog?  If so, how are you dealing with those now that you’re adding the EAD?



3.       What do you think of whole retrieval experience (advanced search options, facets, incorporation into the relevancy algorithm, etc.)?

Thanks in advance for any and all advice and/or other examples that might be out there,


Mark Custer