LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  April 2004

ZNG April 2004

Subject:

Re: metasearch

From:

Peter Noerr <[log in to unmask]>

Reply-To:

Z39.50 Next-Generation Initiative

Date:

Tue, 20 Apr 2004 11:02:05 -0600

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (219 lines)

Agreed that the goal of any protocol or standard is to create commonality,
however no one standard stands any chance of being universally adopted at
the level of abstraction of services such as searching. (Look how long a
basic "mechanical" protocol such as TCP/IP took to be universally adopted.)
Thus any real world metasearch engine has to be prepared for all the
non-standard (or different standard) targets. So there are only 'islands of
advantage' that are gained from adoption of a standard from the point of
view of the engine developer. As a real world example of the 2500+ targets
we have produced connections to, only about 400 are Z39.50, none are SRW/U,
none are OAI, so there is little reduction in work on our side.

from your enhanced explanation and examples it seems you want the search
engine to suggest to the client what searches might be tried in a subsequent
search. In particular you have your example servers sending back alternative
indexes and the fuzzy match list of nearby terms. Given that the search was
directed to a particular index in the first place, I can't see any advantage
in suggesting others. After all, if I as the end user asked for
"author=Theo", then I don't see any benefit in the server telling me "I've
got lots of stuff about Theo and published by him". In the majority of cases
I would have asked those questions, or a broader one to cover all of them
("keyword=Theo").

If the search engine is going to respond with these alternatives then it
must do more work to determine what they are, rather than just return "0
hits" or send back a list of hits. This takes developing and then it takes
machine time to run. This is a big problem with the big commercial sites
(the content aggregators).

Finally it means the client metasearch engine must now know that this target
may return 'search suggestions'. So it has to be able to handle them - just
another form of data to accept. But this data requires a different user
interaction (or machine interaction if we postulate intelligent clients
which know what the user really wants) all this adds to the complexity and
hence resource usage (machine and time). Of course these responses have to
be 'normalised' for user display across servers which respond in this
fashion (after their own interpretation) and those that don't, otherwise we
end up with users who switch this facility off or walk away muttering "Where
did my Google interface go?"

Peter

-----Original Message-----
From: Z39.50 Next-Generation Initiative [mailto:[log in to unmask]]On Behalf Of
Theo van Veen
Sent: Tuesday, April 20, 2004 10:13 AM
To: [log in to unmask]
Subject: Re: metasearch


The reason for using SRU/SRW is to create commonality and to deal with
differences in a controlled way.
At the search engine site there are no extra implementation efforts
involved than putting extra response results in XML according to an
agreed schema.
Some servers can provide better responses then others. The common
denominator approach would results in a lot of "no hits" or diagnostics.
When we allow improved/extended responses in a controlled way then smart
clients and smart servers will have a more interesting exchange of data
than dumb clients and dumb servers but they remain mutual compatible.

Theo


>>> [log in to unmask] 20-4-04 17:01:09 >>>
Comments from another lurker.

Metasearch engines generally do search over diverse sources, so any
commonality (from Z39.50 or SRU/W, etc.) is only a bonus; it cannot be
considered the norm. Consequently the engines will have to undertake
all the
heavy lifting for sites which do their own thing anyway.

Marc is correct that there is still the issue of what to do with the
results. It is bad enough at the moment. Adding a set of results of
unknown
type (from a set of say 5) would mean a lot more conversion logic in
the
engine.

Supporting this functionality at the search engine (target) side will
mean
quite major extensions to the processing for those systems to choose
the
"best" action to take. Unfortunately this may also be client dependant.
So
things could be moving away from a decent result to something very much
less
desirable.

One of the perennial big complaints from the content providers is that
the
metasearch engines are "dumbing down" both their search capabilities
and the
quality of the data returned. Unfortunately this could have a bad
effect in
both of these areas. So I think the content providers would not be too
excited about it. As a metasearch engine provider, it will not really
help
us as we have to take account of the non-standard guys as well.

I also can't quite see how any of this could be considered as a
computing
grid arrangement, but that seems like another thread.

I will be in NC so there will be plenty of us to discuss this in
person.

Peter Noerr
MuseGlobal

-----Original Message-----
From: Z39.50 Next-Generation Initiative [mailto:[log in to unmask]]On Behalf
Of
Marc Cromme
Sent: Tuesday, April 20, 2004 8:04 AM
To: [log in to unmask]
Subject: Re: metasearch


Interesting proposal
I'll comment inbetween your lines:

Theo van Veen wrote:

>Currently it is not easy with SRU/W to broadcast the same query to
many
>SRU/W servers because one has to take into account all the
differences
>between different servers.
>
Definitely true - this due to the fact that SRW is a client-server
protocol, whereas
metasearch broadcasting as you are describing is more a grid computing
task.

Currently, a client has to have good knowledge of the multiple servers
asked, and one has to program the metasearch client logic accordingly
to
the capabilities of the SRW servers used.

The real solution might be a SRW like grid computing protocol, not an
extention to a client-server protocol.

>Especially in metasearching I think it would
>be convenient when there was a possibility to send a query saying
"give
>me what is closest to this query" and allow different servers to
respond
>with a servers choice according to one or more predefined responses.
The
>responses could a.o. be:
>1) searchRetrieveResponse
>2) scanResponse
>3) results of a fuzzy match
>4) number of hits for different access points
>5) etc.
>
>Without having to find out how to translate a query for different
>targets such an "give me the best you can" request returns one or
more
>response blocks and the client can use the ones that it understands
to
>generate guidance to the user to improve his search.
>
I think a give-me-the-best-you-can answer does not resolve the
problem,
since the client still must merge result sets from multiple servers,
and
does not even know if the server considered the hit set to be "the
real
thing" or "just the best I can". The problem is still the logic inside
the client - how to know what and how to merge??

>It is not the same
>as the "x-scanOnSearchFail" parameter, because it can also apply to
>other situations. For example when there are thousands of hits a
server
>could provide a response block in which it gives the number of hits
for
>different indexes. The client can use this to propose new searches,
even
>with indexes that it would not have offered otherwise.
>
>
>
This means spreading the indexes from server to client - IMHO it
smells
like distributed hash tables or distributed inverted indexes in grid
and
peer-to-peer networks. If you do not want to write considerable amount
of logic for each client, it might be better to throw out the
client-server philisophy entirely and let the network merge and keep
track of indexes. But this will not possible by a server-client
centric
protocol like SRW/SRU.

>I remember having proposed something like this earlier and we will
>implement this as a private extension. However,  in the context of
the
>NISO metaseach meeting there may be more support for this concept.
>
>1) Who would support a proposal for extending SRU/SRW with such an
>operation?
>2) Should this be done via a new x-parameter or via a new operation?
>
>BTW Who of this group is attending the NISO metasearch meeting?
>
>Theo
>
>
>
This said, I would like to see more specific how you'd plan to do your
extentions. Probably I understand then better your motivation and
ideas.

I see forward to see more about your ideas.

Marc Cromme, Index Data

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

July 2017
October 2016
July 2016
August 2014
February 2014
December 2013
November 2013
October 2013
February 2013
January 2013
October 2012
August 2012
April 2012
January 2012
October 2011
May 2011
April 2011
November 2010
October 2010
September 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
October 2009
September 2009
August 2009
July 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager