LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  December 2006

ZNG December 2006

Subject:

Re: Models of proximity and where I'd like to take ZING.

From:

Mark Hinnebusch <[log in to unmask]>

Reply-To:

SRU (Search and Retrieve Via URL) Implementors

Date:

Fri, 8 Dec 2006 09:51:22 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (267 lines)

----- Original Message -----
From: "Edward C. Zimmermann" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Thursday, December 07, 2006 5:21 PM
Subject: Re: Models of proximity and where I'd like to take ZING.


> Quoting Mark Hinnebusch <[log in to unmask]>:
>
>> Edward,
>>
>> The whole issue of proximity has always been confused with issues of
>> representation and structure. But, we tended to try to finesse the issue
>> in
>
> Agree
>
>> a couple of ways:
>> (1) how the query is interpreted is a "local issue" and you get
>> what
>>
>> the server says you meant.
>
> Which can be fine when the chances that the two are not that far off from
> one another. It needs to be consistent and the more arbitrary it becomes
> the less satisfying the whole mechanism becomes.
>
>>
>> If I understand your email, you are trying to grapple with what proximity
>> means when there is no usable implicit ordering nor is there an explicit
>> ordering. I would argue that in this case, proximity is meaningless. If
>
> It goes back to last year--- and again in the Hague--- when I argued that
>
> proximity of an element is not really proximity since
> any measure other than distance=0 does not make sense.

Absent explicit ordering of the elements within the structure, I would
agree.

>
>> you want to use the byte position within the XML, then that is an
>> implicit
>
> Not the byte position within XML but the byte position as the information
> was stored as a serial object in storage. It could be binary, PDF, XML,
> might
> have been some GRS fastload.. who knows.. but there is an order. There may
> even be a extrasystem semantic for the order. In the XML markup of
> Shakespeare,
> for example, its the order of one act following the next, one speech
> following
> the next and one line following the next. This association is not demanded
> but can be specified by the searcher as part of the query expression.
>
>> ordering and could be used, but seems to violate the spirit of the XML
>> standard.
>
> Its a layer: additional information beyond just the XML. It may, in fact,
> be undefined. It is not placing anything upon the XML standard but
> enabling
> a set of query expressions to search collections that may have been
> marked-up
> in XML (or SGML or GRS or MARC or ..).

So you are making the ordering explicit, which means adding semantic value
to the original XML representation that does not have any order. Even using
the order in which the bytes are stored is adding semantic value. If you do
this, then proximity makes sense again. But you have transformed the
problem space. >
> Example:
>
> The lines where "love" and "king" are in the same line.
> Among those lines where within 100 bytes (as how its stored on the disk)
> the word "homage" is?

If the line is stored as a single element, we have no problem, right?
Except I would not make an explicit demand that the distance be as defined
on the disk; that is an implementation decision. What if I use multiple
bytes for some strange reason, or store a line in some weird tree structure
for obscure reasons. The distance should be interpreted as distance in the
original document and the implementation would need to be able to calculate
that from the actual stroage mechansim.
>
> In my own "internal" language I use the binary operator AND:path to mean
> in the same path or tag. NEAR:nn to mean within nn bytes of storage so as
> an RPN query:
>
> love king AND:line homage NEAR:100
>
> (NEAR without the :100 would mean in the same unnamed node which would
> just happen to be LINE)
>
> (requesting the SPEECH ancestor of the line hit elements, see below)
>
> `The Two Gentlemen of Verona'
> ** 'speech' Fragment:
> <SPEAKER>Third Outlaw</SPEAKER>
> <LINE>What say'st thou? wilt thou be of our consort?</LINE>
> <LINE>Say ay, and be the captain of us all:</LINE>
> <LINE>We'll do thee homage and be ruled by thee,</LINE>
> <LINE>Love thee as our commander and our king.</LINE>
>
> NOTE: Since within a container (field) we have an order we can talk, to
> keep to my nomenclature, of BEFORE:path and AFTER:path

If you do, in fact, have the order. Isn't that the crux of the matter? In
the example, we would clearly be able to intuit an order. But what if the
data were:

<OBSERVATION>
        <LOCALE> location where the observations were taken </LOCALE>
        <DATUM> n </DATUM>
        <DATUM> n </DATUM>
        <DATUM> n </DATUM>
        <DATUM> n </DATUM>
        <DATUM> n </DATUM>
</OBSERVATION>

then, without knowing, ex cathedra, the meaning of the data and the implicit
order, you can only depend on the physical ordering, yet the XML standard
tells you that you can't. And I don't agree with Ralph that you can fault
the XML tools. They meet the requirements of the standard and that is all
you should expect of them. Otherwise, you can complain that they don't give
a good back-rub. The problem is in the standard or in the data represented
failing to provide explicit ordering as data. So, I think in this case it
goes back to the "server knows all" solution. If you have a server that
somehow "knows" the ordering, then it can offer proximity across the
elements. If it doesn't, then a well-behaved server should refuse to
imagine it out of thin air, or at least give a good back-rub in the process.
>
> The line fragment of damned spot AND:line
>
> ** 'LINE' Fragment:
> Out, damned spot! out, I say!--One: two: why,
>
> or damned spot BEFORE:line but damned spot AFTER:line finding none.
>
>
> Quoting "LeVan,Ralph" <[log in to unmask]>:
>
>>
>> Then there's the issue of unit of retrieval. I've never had a good
>> answer for that one. When they ask for line="out damned", did they want
>> the line, the scene, the act or the play? Typically, I make that
>
> Right.. Or a specific element (path) of that unit.
>
> My model I've thought of as Ancestor/Descendant of hits.
>
> If I look for "out" and "spot" in the same line. I may want the SPEECH.
>
> We have for LINE the path "PLAY\ACT\SCENE\SPEECH\LINE".
>
> I let people specify either PLAY\ACT\SCENE\SPEECH or SPEECH.
> (or also partial paths)
>
> I get:
>
> `The Tragedy of Macbeth'
> ** 'speech' Fragment:
> <SPEAKER>LADY MACBETH</SPEAKER>
> <LINE>Out, damned spot! out, I say!--One: two: why,</LINE>
> <LINE>then, 'tis time to do't.--Hell is murky!--Fie, my</LINE>
> <LINE>lord, fie! a soldier, and afeard? What need we</LINE>
> <LINE>fear who knows it, when none can call our power to</LINE>
> <LINE>account?--Yet who would have thought the old man</LINE>
> <LINE>to have had so much blood in him.</LINE>
>
> We could now have specified the SPEAKER:
> SPEECH/SPEAKER (SPEECH as Ancestor of the hit and SPEAKER as a
> descendant of the SPEECH).
>
> `The Tragedy of Macbeth'
> ** 'speech/speaker' Fragment:
> LADY MACBETH
>
> The path can make a difference..
>
> play/play\\title
>
> is
> `The Tragedy of Macbeth'
> ** 'play/play\title' Fragment:
> The Tragedy of Macbeth
>
> But looking at title we see there are multiple titles.. including of act
> etc.
> `The Tragedy of Macbeth'
> ** 'play/title' Fragment:
> The Tragedy of Macbeth
> ** 'play/title' Fragment:
> Dramatis Personae
> ** 'play/title' Fragment:
> ACT I
> ** 'play/title' Fragment:
> SCENE I. A desert place.
> ** 'play/title' Fragment:
> SCENE II. A camp near Forres.
> ** 'play/title' Fragment:
> SCENE III. A heath near Forres.
> ** 'play/title' Fragment:
> SCENE IV. Forres. The palace.
> ** 'play/title' Fragment:
> SCENE V. Inverness. Macbeth's castle.
> ** 'play/title' Fragment:
> SCENE VI. Before Macbeth's castle.
> ** 'play/title' Fragment:
> SCENE VII. Macbeth's castle.
> ** 'play/title' Fragment:
> ACT II
> ** 'play/title' Fragment:
> SCENE I. Court of Macbeth's castle.
> ** 'play/title' Fragment:
>
> etc etc etc
>
> Its pretty simple to express and quite powerful
>
> PLAY\ACT\SCENE\SPEECH/SPEAKER is the speaker of a speech..
> PLAY\ACT\SCENE/SPEECH/SPEAKER is the speakers of all the speeches that is
> in the scene.. etc.
>
> I think you get the idea.
>
>
>> decision statically and build a database where the play was decomposed
>> into a reasonable unit of retrieval with navigation information added to
>> support moving up and down. If it wasn't clear what unit of retrieval
>> was desired, I'll make versions of the database with records for each
>> unit of retrieval.
>
> With this model of addressing the elements of retrieval we let the
> searcher
> define their own unit of retrieval!
>
> I don't have to re-index my collection of Shakespeare's works to ask and
> get answers to questions like: Who said this and that? In what speech,
> what
> act.. etc.
>
> I can demonstrate the same on the 806791 Reuter's test collection or
> whatever..
>
> I can even apply this to information that can't be marked-up in XML but
> is represented in abstract trees with overlap.
>
> The key is the concept of "hit" and knowing where the coordinates of the
> hit are within the document/record tree.
>
> RDF (and RSS) are real world problems--- and I'm already applying this to
> many 100s of feeds (continuously indexed) in http://www.ibu.de
>
>
>
> --
> --
> Edward C. Zimmermann, Basis Systeme netzwerk, Munich
> Office Leo (R&D):
> Leopoldstrasse 53-55, D-80802 Munich,
> Federal Republic of Germany
> Telephone: Voice:= +49 (89) 385-47074 Corp.Fax:= +49 (89) 692-8150
> Nomadic (SMS/MMS/Fax):= +49 (176) 100-360-55 Alt.Mobile:= +49 (179)
> 205-0539
> http://www.nonmonotonic.net
>

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

July 2017
October 2016
July 2016
August 2014
February 2014
December 2013
November 2013
October 2013
February 2013
January 2013
October 2012
August 2012
April 2012
January 2012
October 2011
May 2011
April 2011
November 2010
October 2010
September 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
October 2009
September 2009
August 2009
July 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager