Print

Print


Michele R Combs wrote:
>> In response to that thread, I hastily jotted down some thoughts for a blog post, located here:
>> http://joyner1302.wordpress.com/2009/01/15/normalized-dates-in-ead/
>>     
>
>
> From that post:
>
> "...has anyone attempted to test their corpus of EAD records with their current search operations vs. indexing and searching those records by means of different models of IR, such as Nutch, INDRI, Solr, or even just Google Custom Search???"
>
> This is a great question, related to the one I posed recently about whether anyone had tried to compare the various indexing and search options.  Would be a really interesting research topic.
>
> Michele
>   
I think many software packages have the same indexing and search options 
for developers. But arguably, some work better than others for 
retrieval, but it also depends on the collection.

System evaluation of archives is what I am doing (or should be doing 
;-)). To evaluate different search algorithms, a test collection and 
fixed set of EAD files to be indexed is needed (see the Cranfield 
experiments in the 1950s, and TREC later).

Many of the software packages use the same search algorithms. Nutch and 
Solr both use Lucene (which employs the Vector Space Model and is 
working well). Lemur/Indri uses so called Language Modelling (which tend 
to perform better for users who have a lot of time to scan exhaustively 
long hit lists).

Regarding data normalization, I was wondering what standard is 
preferred? ISO 8601 i.e. |YYYY-MM-DD|

See: http://www.w3.org/QA/Tips/iso-date

Cheers,
     Junte Zhang
     University of Amsterdam