Hi,

 

Archive Grid (which is a mix of MARC records, EAD, and HTML) does not currently search or sort by dates. Part of this is because of the dissimilarities in formats, part of it also is because of the paucity of consistently encoded data (we didn’t think it was good practice to offer a search by date when it would drop out many good results because of non-encoding). We also did user studies that showed that our end users were unlikely to use advanced search options if offered, so that was the final nail in that coffin and settled all internal debates regarding advanced searching options.

 

ArchiveGrid will be moved into the WorldCat.org platform (as part of the FirstSearch base pack) which may allow for sorting by date.

 

“We are not our users,”

 

Merrilee

 

Merrilee Proffitt, Senior Program Officer

OCLC Programs and Research


From: Encoded Archival Description List [mailto:[log in to unmask]] On Behalf Of Custer, Mark
Sent: Thursday, January 15, 2009 8:47 AM
To: [log in to unmask]
Subject: back-end systems for EAD, and other questions

 

Yesterday’s post about “normalized dates” has me thinking once again about how dates are used (or not used) in EAD records.  As far as I can tell, RLG’s ArchiveGrid doesn’t permit searching by date (I could be wrong on this, though, as I don’t have full access to it, but it does use Lucene to index its records; though I suppose that most of these records are just MARC records?) and Proquest’s Archive Finder does permit searching by date, but it doesn’t really allow you to do very much (i.e. there’s no way to rank your results by “relevancy”).

This leads me to a question:  what sort of back-end systems are archives using for their EAD records? (are there any surveys out there that has this information, or should we start one???)

At ECU, we're using an XML database only, but we aren't doing any advanced searching by date (primarily because, at this time, if you did search for something like "1912", it's not going to limit your results very much; and then, really, you're just back at the whole "browse by collection name" situation).  However, you can do a keyword search for "1912", and the results that are returned to you will be ordered by the number of hits in each document, which, in my mind, is only a small difference in functionality, but perhaps more useful (in most occasions) than simply limiting your results to any and all collection date ranges that contain the year "1912".

This leads me to another set of questions:  is anyone out there using the "bulk" attribute as part of your information retrieval process?...  is anyone using dates beyond the collection range (those dates associated with a series, folder, even an item) in the information retrieval process?...  has anyone attempted to test their corpus of EAD records with their current search operations vs. indexing and searching those records by means of different models of IR, such as Nutch, INDRI, Solr, or even just Google Custom Search???

I think it's great that we're encoding our documents so well, but I keep wondering if we're harnessing that information in the best possible ways yet (and perhaps the best solutions won't be tied to our encoding practices at all).

 

Mark Custer

Text & Markup Coordinator

ECU Digital Collections

http://personal.ecu.edu/custerm