LISTSERV mailing list manager LISTSERV 16.0

Help for ZNG Archives


ZNG Archives

ZNG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ZNG Home

ZNG Home

ZNG  September 2002

ZNG September 2002

Subject:

Re: indexes and masking

From:

Robert Sanderson <[log in to unmask]>

Reply-To:

Z39.50 Next-Generation Initiative

Date:

Tue, 10 Sep 2002 15:48:23 +0100

Content-Type:

TEXT/PLAIN

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (70 lines)

> If this makes sense, then we might as well just do PQN and specify all the
> attributes.
> Why are you so eager to confuse them?

Because I don't want to end up in a situation where I can't predict the
structure of the contents of an index because of different naming
policies.

Hopefully everyone will support at least the Dublin Core index set, but if
they don't specify it (for what ever reason) then they're at least more
likely to call a title index 'foo.title' ...however when it comes to
naming an exact title index, it could be titleString exactTitle, XTitle,
title, etc. Same for titleWord titleKeyWord titleWords and so forth.
This is why I'd like either one index, or a limited set of names to
distinguish structure to append to the name given to the index.

Even with Explain, there's currently no way of knowing what is in the
index without human intervention, even to the point of keyword vs string.
This means that if I did a search for index="word" I wouldn't know if the
zero hits was because there were no matches for the keyword 'word' or if
it was doing an exact match.

I would be okay with a 'very strongly suggested' list of structure
identifiers, but it should be in the spec itself, not an accompanying
document.


Secondly, the desire to not have multiple ways of doing the same thing,
while still allowing the possibility of 'first words in field'.
I think that FWiW should not be proximity on a word index, as this would
create Very Long and unwieldy searches for a relatively easy concept, and
would require field anchoring in proximity.
So the only other option is a string based search.

This calls into question the rationale behind saying that the index is
string based. If we accept that we can use word masks on a string index,
then the concept of structure in indexes is already pretty half-hearted.
It should either be a string or a word index, but not both.

One solution, IMO, is to use my initial proposal of 'word boundary
character' rather than a word masking character.  Thus | would stand for
one or more white space characters or the beginning or end of the field,
not zero or more words.  This also clears up any confusion about the use
of *| -- this would mean zero or more characters, followed by at least one
of: beginning of field, end of field, white space character. Hence:
        (.*?)(^|$| |\n|\t)+
(assuming that punctuation has already been stripped out of the field)

So to do a first words in field search in a string index would be:

title="keyword search|*"

This makes it still a string operation, not a word operation, and hence we
can use it without getting string and word structures all intertwined.

I think my 2 cents are now up to around $5.  The only resolution required
is to get a working first words in field search that is consistent with
the rest of the protocol.  I think the above solves the problems which
Ralph recognised (string/word confusion) and promotes interoperability.

Rob

--
      ,'/:.          Rob Sanderson ([log in to unmask])
    ,'-/::::.        http://www.o-r-g.org/~azaroth/
  ,'--/::(@)::.      Special Collections and Archives, extension 3142
,'---/::::::::::.    Twin Cathedrals:  telnet: liverpool.o-r-g.org 7777
____/:::::::::::::.              WWW:  http://liverpool.o-r-g.org:8000/
I L L U M I N A T I

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

July 2017
October 2016
July 2016
August 2014
February 2014
December 2013
November 2013
October 2013
February 2013
January 2013
October 2012
August 2012
April 2012
January 2012
October 2011
May 2011
April 2011
November 2010
October 2010
September 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
October 2009
September 2009
August 2009
July 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager