Michele R Combs wrote: >> In response to that thread, I hastily jotted down some thoughts for a blog post, located here: >> http://joyner1302.wordpress.com/2009/01/15/normalized-dates-in-ead/ >> > > > From that post: > > "...has anyone attempted to test their corpus of EAD records with their current search operations vs. indexing and searching those records by means of different models of IR, such as Nutch, INDRI, Solr, or even just Google Custom Search???" > > This is a great question, related to the one I posed recently about whether anyone had tried to compare the various indexing and search options. Would be a really interesting research topic. > > Michele > I think many software packages have the same indexing and search options for developers. But arguably, some work better than others for retrieval, but it also depends on the collection. System evaluation of archives is what I am doing (or should be doing ;-)). To evaluate different search algorithms, a test collection and fixed set of EAD files to be indexed is needed (see the Cranfield experiments in the 1950s, and TREC later). Many of the software packages use the same search algorithms. Nutch and Solr both use Lucene (which employs the Vector Space Model and is working well). Lemur/Indri uses so called Language Modelling (which tend to perform better for users who have a lot of time to scan exhaustively long hit lists). Regarding data normalization, I was wondering what standard is preferred? ISO 8601 i.e. |YYYY-MM-DD| See: http://www.w3.org/QA/Tips/iso-date Cheers, Junte Zhang University of Amsterdam