LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  April 2004

ZNG April 2004

Subject:

Re: metasearch

From:

Theo van Veen <[log in to unmask]>

Reply-To:

Z39.50 Next-Generation Initiative

Date:

Tue, 20 Apr 2004 17:42:56 +0200

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (170 lines)

I think I did not make myself clear. It is a lot simpler.

- In this case there are no resultsets to be merged.
- It is far easier to interprete the different XML responses that are
returned than creating different queries.
- I want to do more specific queries depending on the results of a
previous global query, which I think is better than "try this and try
that".
- It is not compulsory: each server can give the same response as it
does now already. Only servers that can do better provide more
information. The responses are done  by stylesheets

I do not understand what you say about "spreading indexes over
clients". What I mean is that when I search for subject=theo and the
server can answer "no hits but I have "10 hits for creator=theo". In
this case the client can suggest  the user to search for "creator=theo".
When my server is doing this it hurts nobody. It is on request only,
servers do not have to implement this and clients may neglect the part
they do not understand and just accept the "no hits" as response. The
response is put in a stylesheet and what is not defined in the
stylesheet is just ignored.

Let me give some examples of possible responses:

Suppose I have the following query:
http://host/sru?query=some_index=theo

Then server A may respond with:
...
<numberOfRecords>0</numberOfRecords>
...
server B may respond with
...
<numberOfRecords>0</numberOfRecords>
<fuzzyMatches>
   <term>thea</term>
   <term>then</term>
   <term>toe</term>
</fuzzyMatches>
...
server C may respond with
...
<numberOfRecords>10000</numberOfRecords>
<hitsPerIndex>   (for query=theo)
   <index>
        <name>any</name>
        <numberOfRecords>99900<numberOfRecords>
   <index>
   </index>
        <name>dc.subject</name>
        <numberOfRecords>5<numberOfRecords>
   <index>
   </index>
        <name>dc.creator</name>
        <numberOfRecords>50<numberOfRecords>
   <index>
   </index>
        <name>composer</name>
        <numberOfRecords>10<numberOfRecords>
</index>
<hitsPerIndex>
...

Different combinations of response blocks would also be possible. The
client will just act on the data it receives.

Theo
_____________________________________________________________________________________

>>> [log in to unmask] 20-4-04 16:03:34 >>>
Interesting proposal
I'll comment inbetween your lines:

Theo van Veen wrote:

>Currently it is not easy with SRU/W to broadcast the same query to
many
>SRU/W servers because one has to take into account all the
differences
>between different servers.
>
Definitely true - this due to the fact that SRW is a client-server
protocol, whereas
metasearch broadcasting as you are describing is more a grid computing
task.

Currently, a client has to have good knowledge of the multiple servers
asked, and one has to program the metasearch client logic accordingly
to
the capabilities of the SRW servers used.

The real solution might be a SRW like grid computing protocol, not an
extention to a client-server protocol.

>Especially in metasearching I think it would
>be convenient when there was a possibility to send a query saying
"give
>me what is closest to this query" and allow different servers to
respond
>with a servers choice according to one or more predefined responses.
The
>responses could a.o. be:
>1) searchRetrieveResponse
>2) scanResponse
>3) results of a fuzzy match
>4) number of hits for different access points
>5) etc.
>
>Without having to find out how to translate a query for different
>targets such an "give me the best you can" request returns one or
more
>response blocks and the client can use the ones that it understands
to
>generate guidance to the user to improve his search.
>
I think a give-me-the-best-you-can answer does not resolve the
problem,
since the client still must merge result sets from multiple servers,
and
does not even know if the server considered the hit set to be "the
real
thing" or "just the best I can". The problem is still the logic inside
the client - how to know what and how to merge??

>It is not the same
>as the "x-scanOnSearchFail" parameter, because it can also apply to
>other situations. For example when there are thousands of hits a
server
>could provide a response block in which it gives the number of hits
for
>different indexes. The client can use this to propose new searches,
even
>with indexes that it would not have offered otherwise.
>
>
>
This means spreading the indexes from server to client - IMHO it
smells
like distributed hash tables or distributed inverted indexes in grid
and
peer-to-peer networks. If you do not want to write considerable amount
of logic for each client, it might be better to throw out the
client-server philisophy entirely and let the network merge and keep
track of indexes. But this will not possible by a server-client
centric
protocol like SRW/SRU.

>I remember having proposed something like this earlier and we will
>implement this as a private extension. However,  in the context of
the
>NISO metaseach meeting there may be more support for this concept.
>
>1) Who would support a proposal for extending SRU/SRW with such an
>operation?
>2) Should this be done via a new x-parameter or via a new operation?
>
>BTW Who of this group is attending the NISO metasearch meeting?
>
>Theo
>
>
>
This said, I would like to see more specific how you'd plan to do your
extentions. Probably I understand then better your motivation and
ideas.

I see forward to see more about your ideas.

Marc Cromme, Index Data

Top of Message | Previous Page | Permalink

Advanced Options


Options

Error during command authentication.

Error - unable to initiate communication with LISTSERV (errno=111). The server is probably not started.

Log In

Log In

Get Password

Get Password


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager