Quoting Mark Hinnebusch <[log in to unmask]>:

> Edward,
>   The whole issue of proximity has always been confused with issues of 
> representation and structure.  But, we tended to try to finesse the issue in


> a couple of ways:
>         (1) how the query is interpreted is a "local issue" and you get what
> the server says you meant.

Which can be fine when the chances that the two are not that far off from
one another. It needs to be consistent and the more arbitrary it becomes
the less satisfying the whole mechanism becomes.

> If I understand your email, you are trying to grapple with what proximity 
> means when there is no usable implicit ordering nor is there an explicit 
> ordering.  I would argue that in this case, proximity is meaningless.  If 

It goes back to last year--- and again in the Hague--- when I argued that 

proximity of an element  is not really proximity since 
any measure other than distance=0 does not make sense.

> you want to use the byte position within the XML, then that is an implicit

Not the byte position within XML but the byte position as the information
was stored as a serial object in storage. It could be binary, PDF, XML, might
have been some GRS fastload.. who knows.. but there is an order. There may
even be a extrasystem semantic for the order. In the XML markup of Shakespeare,
for example, its the order of one act following the next, one speech following
the next and one line following the next. This association is not demanded
but can be specified by the searcher as part of the query expression.
> ordering and could be used, but seems to violate the spirit of the XML 
> standard.

Its a layer: additional information beyond just the XML. It may, in fact,
be undefined. It is not placing anything upon the XML standard but enabling
a set of query expressions to search collections that may have been marked-up
in XML (or SGML or GRS or MARC or ..).


The lines where "love" and "king" are in the same line.
Among those lines where within 100 bytes (as how its stored on the disk)
the word "homage" is?

In my own "internal" language I use the binary operator AND:path to mean
in the same path or tag. NEAR:nn to mean within nn bytes of storage so as
an RPN query:

love king  AND:line  homage NEAR:100

(NEAR without the :100 would mean in the same unnamed node which would
just happen to be LINE)

(requesting the SPEECH ancestor of the line hit elements, see below)

`The Two Gentlemen of Verona'
** 'speech' Fragment: 
<LINE>What say'st thou? wilt thou be of our consort?</LINE>
<LINE>Say ay, and be the captain of us all:</LINE>
<LINE>We'll do thee homage and be ruled by thee,</LINE>
<LINE>Love thee as our commander and our king.</LINE>

NOTE: Since within a container (field) we have an order we can talk, to
keep to my nomenclature, of BEFORE:path and AFTER:path

The line fragment of damned spot AND:line

** 'LINE' Fragment: 
Out, damned spot! out, I say!--One: two: why,

or damned spot BEFORE:line but damned spot AFTER:line finding none.

Quoting "LeVan,Ralph" <[log in to unmask]>:

> Then there's the issue of unit of retrieval.  I've never had a good
> answer for that one.  When they ask for line="out damned", did they want
> the line, the scene, the act or the play?  Typically, I make that

Right.. Or a specific element (path) of that unit.

My model I've thought of as Ancestor/Descendant of hits.

If I look for "out" and "spot" in the same line. I may want the SPEECH.

We have for LINE the path "PLAY\ACT\SCENE\SPEECH\LINE".

I let people specify either PLAY\ACT\SCENE\SPEECH or SPEECH.
(or also partial paths)

I get:

`The Tragedy of Macbeth'
** 'speech' Fragment: 
<LINE>Out, damned spot! out, I say!--One: two: why,</LINE>
<LINE>then, 'tis time to do't.--Hell is murky!--Fie, my</LINE>
<LINE>lord, fie! a soldier, and afeard? What need we</LINE>
<LINE>fear who knows it, when none can call our power to</LINE>
<LINE>account?--Yet who would have thought the old man</LINE>
<LINE>to have had so much blood in him.</LINE>

We could now have specified the SPEAKER:
  SPEECH/SPEAKER (SPEECH as Ancestor of the hit and SPEAKER as a
descendant of the SPEECH).

`The Tragedy of Macbeth'
** 'speech/speaker' Fragment: 

The path can make a difference..


`The Tragedy of Macbeth'
** 'play/play\title' Fragment: 
The Tragedy of Macbeth

But looking at title we see there are multiple titles.. including of act
`The Tragedy of Macbeth'
** 'play/title' Fragment: 
The Tragedy of Macbeth
** 'play/title' Fragment: 
Dramatis Personae
** 'play/title' Fragment: 
** 'play/title' Fragment: 
SCENE I.  A desert place.
** 'play/title' Fragment: 
SCENE II.  A camp near Forres.
** 'play/title' Fragment: 
SCENE III.  A heath near Forres.
** 'play/title' Fragment: 
SCENE IV.  Forres. The palace.
** 'play/title' Fragment: 
SCENE V.  Inverness. Macbeth's castle.
** 'play/title' Fragment: 
SCENE VI.  Before Macbeth's castle.
** 'play/title' Fragment: 
SCENE VII.  Macbeth's castle.
** 'play/title' Fragment: 
** 'play/title' Fragment: 
SCENE I.  Court of Macbeth's castle.
** 'play/title' Fragment: 

etc etc etc

Its pretty simple to express and quite powerful

PLAY\ACT\SCENE\SPEECH/SPEAKER is the speaker of a speech..
PLAY\ACT\SCENE/SPEECH/SPEAKER is the speakers of all the speeches that is
in the scene.. etc.

I think you get the idea.

> decision statically and build a database where the play was decomposed
> into a reasonable unit of retrieval with navigation information added to
> support moving up and down.  If it wasn't clear what unit of retrieval
> was desired, I'll make versions of the database with records for each
> unit of retrieval.

With this model of addressing the elements of retrieval we let the searcher
define their own unit of retrieval!

I don't have to re-index my collection of Shakespeare's works to ask and
get answers to questions like: Who said this and that? In what speech, what
act.. etc.

I can demonstrate the same on the 806791 Reuter's test collection or

I can even apply this to information that can't be marked-up in XML but
is represented in abstract trees with overlap.

The key is the concept of "hit" and knowing where the coordinates of the
hit are within the document/record tree.

RDF (and RSS) are real world problems--- and I'm already applying this to
many 100s of feeds (continuously indexed) in

Edward C. Zimmermann, Basis Systeme netzwerk, Munich
Office Leo (R&D):
   Leopoldstrasse 53-55, D-80802 Munich,
   Federal Republic of Germany
Telephone:   Voice:=  +49 (89) 385-47074  Corp.Fax:= +49 (89)  692-8150
 Nomadic (SMS/MMS/Fax):= +49 (176) 100-360-55  Alt.Mobile:= +49 (179) 205-0539