Hi Mark,
The website for the RIAMCO consortium
is not yet live but to answer your questions:
-
We’re using
Solr to index the EAD records.
-
Advanced search will offer researchers
the option of limiting by date but this field will only search collection dates
(not dates in the <dsc>).
-
We’re not
planning on using bulk dates in the searches.
In thinking about normalized dates we
starting questioning why we were using normalized in the components since we
knew, with our current search capabilities, we would not be searching those
dates. We decided to keep
normalized in the components for that “someday” when we would be able to offer
researchers enhanced search and retrieval functions in the
<dsc>.
* * * * * * * * * * * * * * * * * *
* * * * * * * * * * * *
Jennifer J.
Betts
RIAMCO Project Manager
John Hay
Library, Box A
TEL: (401) 863-2148
CELL: (401)
480-1173
FAX: (401) 863-2093
RIAMCO wiki: https://wiki.brown.edu/confluence/display/library/RIAMCO
Yesterday’s post about “normalized dates” has me thinking once again about
how dates are used (or not used) in EAD records. As far as I can tell,
RLG’s ArchiveGrid doesn’t permit searching by date (I could be wrong on this,
though, as I don’t have full access to it, but it does use Lucene to index its
records; though I suppose that most of these records are just MARC records?) and
Proquest’s Archive Finder does permit searching by date, but it doesn’t really
allow you to do very much (i.e. there’s no way to rank your results by
“relevancy”).
This leads me to a question: what sort of back-end systems are archives
using for their EAD records? (are there any surveys out there that has this
information, or should we start one???)
At ECU, we're using an XML database only, but we aren't doing any advanced
searching by date (primarily because, at this time, if you did search for
something like "1912", it's not going to limit your results very much; and then,
really, you're just back at the whole "browse by collection name"
situation). However, you can do a keyword search for "1912", and the
results that are returned to you will be ordered by the number of hits in each
document, which, in my mind, is only a small difference in functionality, but
perhaps more useful (in most occasions) than simply limiting your results to any
and all collection date ranges that contain the year "1912".
This leads me to another set of questions: is anyone out there using
the "bulk" attribute as part of your information retrieval process?... is
anyone using dates beyond the collection range (those dates associated with a
series, folder, even an item) in the information retrieval process?... has
anyone attempted to test their corpus of EAD records with their current search
operations vs. indexing and searching those records by means of different
models of IR, such as Nutch, INDRI, Solr, or even just Google
Custom Search???
I think it's great that we're encoding our documents so well, but I keep
wondering if we're harnessing that information in the best possible ways yet
(and perhaps the best solutions won't be tied to our encoding practices at
all).
Mark Custer
Text & Markup Coordinator
ECU Digital Collections
http://personal.ecu.edu/custerm