Semantic variation is a fact of life of any metasearch activity. Even
harvesting does not reduce the semantic differences of the underlying
documents. An aggregated document set will be re-indexed and the indices
will expand to include the union of the disparate originals or the originals
will all be fitted into the controlled vocabulary of the aggregation. The
former only gives a mechanical advantage and no search advantage (in fact a
disadvantage as in the original databases - which are presumed to be
semantically optimised in some way - the search can be mapped to the
specifics of the individual database semantics) the latter fits disparate
documents under 'awkward' terms and reduces the specificity of those terms
as measured by the cohesiveness of the documents they index.
I would argue that the user gets better retrieval from a metasearch across a
number of smaller specialist databases than from the same documents
aggregated in one database. The variable is the accuracy of mapping the
user's search to the capabilities of the individual database. If it uses the
semantics of the database (a 'term list for example) then the mapping is
improved for each such database and the overall result is improved. (I am
talking mostly precision here, given real life volumes recall is usually not
the problem.)
Peter Noerr
-----Original Message-----
From: Z39.50 Next-Generation Initiative [mailto:[log in to unmask]]On Behalf Of
Fabio Simeoni
Sent: Tuesday, April 20, 2004 10:28 AM
To: [log in to unmask]
Subject: FW: metasearch
>Currently it is not easy with SRU/W to broadcast the same query to many
>SRU/W servers because one has to take into account all the differences
>between different servers.
>
>> Definitely true - this due to the fact that SRW is a client-server
protocol, whereas metasearch broadcasting as you are describing is more
a grid computing task.
One wonders if such problems of semantic interoperability between
server-side implementations are an invariant of the distributed
computing model underlying SRW/SRU rather than the protocol itself,
Z39.50 and indeed any other implementation of that model.
In particular, one wonders whether is the very requirement that servers
align their interpretation of service provision (here searching) that
introduces assumptions about the autonomy of participating parties which
cannot be advanced in a large-scale federated environment.
In this sense, harvesting as implemented in OAI-PMH relieves servers
from any other semantic alignment beyond metadata format and thus suffer
considerably less from these problems. With distributed computing, and
thus with SRW/SRU, each party which contributes its data must also
participate of service provision and this amplifies requirements of
mutual consistency and the problems these requirements raise in a
loosely-coupled environment.
regards,
fabio simeoni
***************************************
Fabio Simeoni
Senior Research Fellow
Centre for Digital Library Research (CDLR)
Computer and Information Sciences Department
University of Strathclyde
Tel:0044-(0)141-5485855
|