> The use of harvesting as opposed to distributed searching
> does eliminate a few unknowns in terms of indexing rules and
> server capabilities, but it doesn't relieve the harvester
> (OAI service provider) from understanding data retrieved from
> different sources and merging it in a suitable way.
Agreed, I think my reply to Peter Noerr makes it clear it's matter of
*how much* not *if*.
> Remember also that some search
> engines hold tens of millions of records (Google of course
> holds billions); some hold highly volatile information such
> as library circulation data; many hold copyrighted material
> closely guarded by the owners. There are lots of cases where
> harvesting is simply not practical.
Granted. We are discussing large-scale and scalable metasearching in the
federation though, and it is important to ascertain -- in as service-,
protocol- and largely application domain-independent way -- whether
harvesting accomodates the requiremens of the federation more gracefully
than distributed computing does and -- in particular -- characterise
which services *in* the federation are more suitable to one approach
rather than the other.
I agree with you that extremely dynamic data defeats harvesting because
it would require harvesting rates so high that they would essentially
reintroduce the netowork as a run-time observable of service provision
(with problems of latency and completion time). You will agree however,
that examples such as circulation data rely on strong inter-party
agreements which can only expected within tightly coupled subsets of the
federation. Put another way, these services operate *within* the
federation but do not belong to the category of truly federated
A similar argument hold in the case of local interfaces to remote
services, which is a prototypical application of the distributed
computing approach (e.g. the Z39.50 interface to COPAC or a SRW
interface to Google), one that avoid interop problems, and one that
leverages costs of service provisions which have been already absorbed
within the federation.
> I do think that harvesting and
> distributed searching will continue to supplement eachother.
> It's not an either-or kind of situation.
Yep. It would be good to start characterising services in this sense. My
initial observations pertained to large-scale federated meta-searching,
where I think distributed computing is easily misapplied.
Senior Research Fellow
Centre for Digital Library Research (CDLR)
Computer and Information Sciences Department
University of Strathclyde