Partitioning:
Let me see if I can make the point a bit easier.
In searching for a query there are the records matching the specification and
the coordinates of the match within the records in the document space.
Each of these "hit" coordinates is somewhere within the document tree.
With a named path one can identify the ancestor in the tree of a hit.
(using the Shakespeare XML again)
<PLAY>
...
<ACT>
...
<SCENE>
...
<SPEECH>
<SPEAKER>LADY MACBETH</SPEAKER>
<LINE>Out, damned spot! out, I say!--One: two: why,</LINE>
<LINE>then, 'tis time to do't.--Hell is murky!--Fie, my</LINE>
<LINE>lord, fie! a soldier, and afeard? What need we</LINE>
<LINE>fear who knows it, when none can call our power to</LINE>
<LINE>account?--Yet who would have thought the old man</LINE>
<LINE>to have had so much blood in him.</LINE>
</SPEECH>
...
</PLAY>
The "spot" in line is in "LINE" (PLAY\ACT\SCENE\SPEECH\LINE). The line
is "Out, damned spot! out, I say!--One: two: why,". Its in a speech
the speech is the above fragment. Its in a scene, its content is the set
of all speeches in that scene including this one where "out" and "spot"
was said in the same line. etc. etc. etc.
At any point going up the tree from one of my hit coordinates I may also
ask for the content of any of the descendants of a named ancestor, to the
extent that it has any children. LINE, for instance, is a final leaf and
has none.
The ancestor SPEECH of my hit "spot" in the "out" and "spot" in the same
LINE is the above speech. Its LINE descendants are the LINES above. Not
just the one line but all 6 of them. The SPEAKER descendant is the content
of "SPEAKER" and that's LADY MACBETH.
We may also map these coordinates to offsets of storage of the document as
a serial object on the disk, viz. the "document order". This mapping lets
us order the list of these coordinates as per the document order to give
us next and previous among the siblings.
By selecting a path for ancestor and a path of a descendant of that ancestor
of a hit (node) and with an order of hits (previous/next) we have, I suggest,
an easy to express and sufficient model of search segmentation for the concept
of unit of retrieval for "result".
Since we no longer have the restriction of OIDs for element retrieval we
can encode this within the named element retrieval as string.
In RSS search if I'm interested in items (which I am) and not feeds as my
element of retrieval then if I want to get an Ancestor ITEM (as ..\channel\item)
and to follow the link to the story I want that ancestor's LINK and TITLE
descendant elements. For a brief summary I maybe want the DESCRIPTION element
of the particular ITEM etc.
If my search query matched say some text in the TITLE element of a CHANNEL
and not any item then I don't have an ITEM for that hit. But I have, should
the user request it, a channel.
This makes sense since I may ask questions like? What news stories has BBC
currently in their top feed in contrast to what news stories are currently
talking about BBC. The questions are different (why we have contextual search)
but also the intended element of retrieval is different.
Its so simple to express and yet delivers, I think, exactly what we want, resp.
need.
Counter-examples anyone?
--
--
Edward C. Zimmermann, Basis Systeme netzwerk, Munich
Office Leo (R&D):
Leopoldstrasse 53-55, D-80802 Munich,
Federal Republic of Germany
Telephone: Voice:= +49 (89) 385-47074 Corp.Fax:= +49 (89) 692-8150
Nomadic (SMS/MMS/Fax):= +49 (176) 100-360-55 Alt.Mobile:= +49 (179) 205-0539
http://www.nonmonotonic.net
|