Agreed that the goal of any protocol or standard is to create commonality,
however no one standard stands any chance of being universally adopted at
the level of abstraction of services such as searching. (Look how long a
basic "mechanical" protocol such as TCP/IP took to be universally adopted.)
Thus any real world metasearch engine has to be prepared for all the
non-standard (or different standard) targets. So there are only 'islands of
advantage' that are gained from adoption of a standard from the point of
view of the engine developer. As a real world example of the 2500+ targets
we have produced connections to, only about 400 are Z39.50, none are SRW/U,
none are OAI, so there is little reduction in work on our side.
from your enhanced explanation and examples it seems you want the search
engine to suggest to the client what searches might be tried in a subsequent
search. In particular you have your example servers sending back alternative
indexes and the fuzzy match list of nearby terms. Given that the search was
directed to a particular index in the first place, I can't see any advantage
in suggesting others. After all, if I as the end user asked for
"author=Theo", then I don't see any benefit in the server telling me "I've
got lots of stuff about Theo and published by him". In the majority of cases
I would have asked those questions, or a broader one to cover all of them
("keyword=Theo").
If the search engine is going to respond with these alternatives then it
must do more work to determine what they are, rather than just return "0
hits" or send back a list of hits. This takes developing and then it takes
machine time to run. This is a big problem with the big commercial sites
(the content aggregators).
Finally it means the client metasearch engine must now know that this target
may return 'search suggestions'. So it has to be able to handle them - just
another form of data to accept. But this data requires a different user
interaction (or machine interaction if we postulate intelligent clients
which know what the user really wants) all this adds to the complexity and
hence resource usage (machine and time). Of course these responses have to
be 'normalised' for user display across servers which respond in this
fashion (after their own interpretation) and those that don't, otherwise we
end up with users who switch this facility off or walk away muttering "Where
did my Google interface go?"
Peter
-----Original Message-----
From: Z39.50 Next-Generation Initiative [mailto:[log in to unmask]]On Behalf Of
Theo van Veen
Sent: Tuesday, April 20, 2004 10:13 AM
To: [log in to unmask]
Subject: Re: metasearch
The reason for using SRU/SRW is to create commonality and to deal with
differences in a controlled way.
At the search engine site there are no extra implementation efforts
involved than putting extra response results in XML according to an
agreed schema.
Some servers can provide better responses then others. The common
denominator approach would results in a lot of "no hits" or diagnostics.
When we allow improved/extended responses in a controlled way then smart
clients and smart servers will have a more interesting exchange of data
than dumb clients and dumb servers but they remain mutual compatible.
Theo
>>> [log in to unmask] 20-4-04 17:01:09 >>>
Comments from another lurker.
Metasearch engines generally do search over diverse sources, so any
commonality (from Z39.50 or SRU/W, etc.) is only a bonus; it cannot be
considered the norm. Consequently the engines will have to undertake
all the
heavy lifting for sites which do their own thing anyway.
Marc is correct that there is still the issue of what to do with the
results. It is bad enough at the moment. Adding a set of results of
unknown
type (from a set of say 5) would mean a lot more conversion logic in
the
engine.
Supporting this functionality at the search engine (target) side will
mean
quite major extensions to the processing for those systems to choose
the
"best" action to take. Unfortunately this may also be client dependant.
So
things could be moving away from a decent result to something very much
less
desirable.
One of the perennial big complaints from the content providers is that
the
metasearch engines are "dumbing down" both their search capabilities
and the
quality of the data returned. Unfortunately this could have a bad
effect in
both of these areas. So I think the content providers would not be too
excited about it. As a metasearch engine provider, it will not really
help
us as we have to take account of the non-standard guys as well.
I also can't quite see how any of this could be considered as a
computing
grid arrangement, but that seems like another thread.
I will be in NC so there will be plenty of us to discuss this in
person.
Peter Noerr
MuseGlobal
-----Original Message-----
From: Z39.50 Next-Generation Initiative [mailto:[log in to unmask]]On Behalf
Of
Marc Cromme
Sent: Tuesday, April 20, 2004 8:04 AM
To: [log in to unmask]
Subject: Re: metasearch
Interesting proposal
I'll comment inbetween your lines:
Theo van Veen wrote:
>Currently it is not easy with SRU/W to broadcast the same query to
many
>SRU/W servers because one has to take into account all the
differences
>between different servers.
>
Definitely true - this due to the fact that SRW is a client-server
protocol, whereas
metasearch broadcasting as you are describing is more a grid computing
task.
Currently, a client has to have good knowledge of the multiple servers
asked, and one has to program the metasearch client logic accordingly
to
the capabilities of the SRW servers used.
The real solution might be a SRW like grid computing protocol, not an
extention to a client-server protocol.
>Especially in metasearching I think it would
>be convenient when there was a possibility to send a query saying
"give
>me what is closest to this query" and allow different servers to
respond
>with a servers choice according to one or more predefined responses.
The
>responses could a.o. be:
>1) searchRetrieveResponse
>2) scanResponse
>3) results of a fuzzy match
>4) number of hits for different access points
>5) etc.
>
>Without having to find out how to translate a query for different
>targets such an "give me the best you can" request returns one or
more
>response blocks and the client can use the ones that it understands
to
>generate guidance to the user to improve his search.
>
I think a give-me-the-best-you-can answer does not resolve the
problem,
since the client still must merge result sets from multiple servers,
and
does not even know if the server considered the hit set to be "the
real
thing" or "just the best I can". The problem is still the logic inside
the client - how to know what and how to merge??
>It is not the same
>as the "x-scanOnSearchFail" parameter, because it can also apply to
>other situations. For example when there are thousands of hits a
server
>could provide a response block in which it gives the number of hits
for
>different indexes. The client can use this to propose new searches,
even
>with indexes that it would not have offered otherwise.
>
>
>
This means spreading the indexes from server to client - IMHO it
smells
like distributed hash tables or distributed inverted indexes in grid
and
peer-to-peer networks. If you do not want to write considerable amount
of logic for each client, it might be better to throw out the
client-server philisophy entirely and let the network merge and keep
track of indexes. But this will not possible by a server-client
centric
protocol like SRW/SRU.
>I remember having proposed something like this earlier and we will
>implement this as a private extension. However, in the context of
the
>NISO metaseach meeting there may be more support for this concept.
>
>1) Who would support a proposal for extending SRU/SRW with such an
>operation?
>2) Should this be done via a new x-parameter or via a new operation?
>
>BTW Who of this group is attending the NISO metasearch meeting?
>
>Theo
>
>
>
This said, I would like to see more specific how you'd plan to do your
extentions. Probably I understand then better your motivation and
ideas.
I see forward to see more about your ideas.
Marc Cromme, Index Data
|