> So, let me get this straight. To support federated search you
> sort of harvested the content from each of the indexes into a
> new database and provided search against this new database?
I don't think so.
What Ralph does (I think), and what we do, is to send the SRU search to
all databases, pull back the results and merge them into a single result
list.
In our case we "sort" by database - that means we don't need to pull all
the results back.
So let us say we search three databases A, B, C. We send a
searchRetrieve request to all three databases asking for no records to
be returned. A says it has 15 results, B says it has 10, and C says it
has 5. In our user interface we only display 10 records at a time, so we
start by displaying the first 10 from A (a second searchRetreive this
time asking for 10 records). If the user selects the next page, we pull
back the remaining 5 from A, and the first 5 from B, and so on.
There are obvious optimisations and improvements you can do on that
model.
Rob is adding another level of complexity often known as a centroid.
In this case you pull back the list of terms from an index from each
database via scan. So it this case (simplistically) database A for
authors might return the list {Smith - 15 occurences, Shakespeare - 10
occurences, Morgan - 1 occurence, Dovey - 10 occurences, Sanderson - 15
occurences} (i.e. those are the only authors which occur in the
database), B the list {Smith - 28 occurences, Morgan - 10 occurences,
Dovey -5 occurences} and C {Smith - 28 occurences, Sanderson - 10
occurences}.
So if the user searched for author=Morgan there is no point in sending a
request to database C, and probably not much point sending to A either.
This approach reduces the number of database you can search for a
particular query, however, isn't very good if you are trying to locate
particular items (e.g. if these were databases of rare/antiquarian
books).
Matthew
|