LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  December 2006

ZNG December 2006

Subject:

Re: Models of proximity and where I'd like to take ZING.

From:

"Edward C. Zimmermann" <[log in to unmask]>

Reply-To:

SRU (Search and Retrieve Via URL) Implementors

Date:

Thu, 7 Dec 2006 23:21:16 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (206 lines)

Quoting Mark Hinnebusch <[log in to unmask]>:

> Edward,
> 
>   The whole issue of proximity has always been confused with issues of 
> representation and structure.  But, we tended to try to finesse the issue in

Agree

> a couple of ways:
>         (1) how the query is interpreted is a "local issue" and you get what
> 
> the server says you meant.

Which can be fine when the chances that the two are not that far off from
one another. It needs to be consistent and the more arbitrary it becomes
the less satisfying the whole mechanism becomes.

> 
> If I understand your email, you are trying to grapple with what proximity 
> means when there is no usable implicit ordering nor is there an explicit 
> ordering.  I would argue that in this case, proximity is meaningless.  If 

It goes back to last year--- and again in the Hague--- when I argued that 

proximity of an element  is not really proximity since 
any measure other than distance=0 does not make sense.

> you want to use the byte position within the XML, then that is an implicit

Not the byte position within XML but the byte position as the information
was stored as a serial object in storage. It could be binary, PDF, XML, might
have been some GRS fastload.. who knows.. but there is an order. There may
even be a extrasystem semantic for the order. In the XML markup of Shakespeare,
for example, its the order of one act following the next, one speech following
the next and one line following the next. This association is not demanded
but can be specified by the searcher as part of the query expression.
 
> ordering and could be used, but seems to violate the spirit of the XML 
> standard.

Its a layer: additional information beyond just the XML. It may, in fact,
be undefined. It is not placing anything upon the XML standard but enabling
a set of query expressions to search collections that may have been marked-up
in XML (or SGML or GRS or MARC or ..).

Example:

The lines where "love" and "king" are in the same line.
Among those lines where within 100 bytes (as how its stored on the disk)
the word "homage" is?

In my own "internal" language I use the binary operator AND:path to mean
in the same path or tag. NEAR:nn to mean within nn bytes of storage so as
an RPN query:

love king  AND:line  homage NEAR:100

(NEAR without the :100 would mean in the same unnamed node which would
just happen to be LINE)

(requesting the SPEECH ancestor of the line hit elements, see below)

`The Two Gentlemen of Verona'
** 'speech' Fragment: 
<SPEAKER>Third Outlaw</SPEAKER>
<LINE>What say'st thou? wilt thou be of our consort?</LINE>
<LINE>Say ay, and be the captain of us all:</LINE>
<LINE>We'll do thee homage and be ruled by thee,</LINE>
<LINE>Love thee as our commander and our king.</LINE>

NOTE: Since within a container (field) we have an order we can talk, to
keep to my nomenclature, of BEFORE:path and AFTER:path

The line fragment of damned spot AND:line

** 'LINE' Fragment: 
Out, damned spot! out, I say!--One: two: why,

or damned spot BEFORE:line but damned spot AFTER:line finding none.


Quoting "LeVan,Ralph" <[log in to unmask]>:

>
> Then there's the issue of unit of retrieval.  I've never had a good
> answer for that one.  When they ask for line="out damned", did they want
> the line, the scene, the act or the play?  Typically, I make that

Right.. Or a specific element (path) of that unit.

My model I've thought of as Ancestor/Descendant of hits.

If I look for "out" and "spot" in the same line. I may want the SPEECH.

We have for LINE the path "PLAY\ACT\SCENE\SPEECH\LINE".

I let people specify either PLAY\ACT\SCENE\SPEECH or SPEECH.
(or also partial paths)

I get:

`The Tragedy of Macbeth'
** 'speech' Fragment: 
<SPEAKER>LADY MACBETH</SPEAKER>
<LINE>Out, damned spot! out, I say!--One: two: why,</LINE>
<LINE>then, 'tis time to do't.--Hell is murky!--Fie, my</LINE>
<LINE>lord, fie! a soldier, and afeard? What need we</LINE>
<LINE>fear who knows it, when none can call our power to</LINE>
<LINE>account?--Yet who would have thought the old man</LINE>
<LINE>to have had so much blood in him.</LINE>

We could now have specified the SPEAKER:
  SPEECH/SPEAKER (SPEECH as Ancestor of the hit and SPEAKER as a
descendant of the SPEECH).

`The Tragedy of Macbeth'
** 'speech/speaker' Fragment: 
LADY MACBETH

The path can make a difference..

play/play\\title

is
`The Tragedy of Macbeth'
** 'play/play\title' Fragment: 
The Tragedy of Macbeth

But looking at title we see there are multiple titles.. including of act
etc.
`The Tragedy of Macbeth'
** 'play/title' Fragment: 
The Tragedy of Macbeth
** 'play/title' Fragment: 
Dramatis Personae
** 'play/title' Fragment: 
ACT I
** 'play/title' Fragment: 
SCENE I.  A desert place.
** 'play/title' Fragment: 
SCENE II.  A camp near Forres.
** 'play/title' Fragment: 
SCENE III.  A heath near Forres.
** 'play/title' Fragment: 
SCENE IV.  Forres. The palace.
** 'play/title' Fragment: 
SCENE V.  Inverness. Macbeth's castle.
** 'play/title' Fragment: 
SCENE VI.  Before Macbeth's castle.
** 'play/title' Fragment: 
SCENE VII.  Macbeth's castle.
** 'play/title' Fragment: 
ACT II
** 'play/title' Fragment: 
SCENE I.  Court of Macbeth's castle.
** 'play/title' Fragment: 

etc etc etc

Its pretty simple to express and quite powerful

PLAY\ACT\SCENE\SPEECH/SPEAKER is the speaker of a speech..
PLAY\ACT\SCENE/SPEECH/SPEAKER is the speakers of all the speeches that is
in the scene.. etc.

I think you get the idea.


> decision statically and build a database where the play was decomposed
> into a reasonable unit of retrieval with navigation information added to
> support moving up and down.  If it wasn't clear what unit of retrieval
> was desired, I'll make versions of the database with records for each
> unit of retrieval.

With this model of addressing the elements of retrieval we let the searcher
define their own unit of retrieval!

I don't have to re-index my collection of Shakespeare's works to ask and
get answers to questions like: Who said this and that? In what speech, what
act.. etc.

I can demonstrate the same on the 806791 Reuter's test collection or
whatever..

I can even apply this to information that can't be marked-up in XML but
is represented in abstract trees with overlap.

The key is the concept of "hit" and knowing where the coordinates of the
hit are within the document/record tree.

RDF (and RSS) are real world problems--- and I'm already applying this to
many 100s of feeds (continuously indexed) in http://www.ibu.de



-- 
-- 
Edward C. Zimmermann, Basis Systeme netzwerk, Munich
Office Leo (R&D):
   Leopoldstrasse 53-55, D-80802 Munich,
   Federal Republic of Germany
Telephone:   Voice:=  +49 (89) 385-47074  Corp.Fax:= +49 (89)  692-8150
 Nomadic (SMS/MMS/Fax):= +49 (176) 100-360-55  Alt.Mobile:= +49 (179) 205-0539
http://www.nonmonotonic.net

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

July 2017
October 2016
July 2016
August 2014
February 2014
December 2013
November 2013
October 2013
February 2013
January 2013
October 2012
August 2012
April 2012
January 2012
October 2011
May 2011
April 2011
November 2010
October 2010
September 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
October 2009
September 2009
August 2009
July 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager