I'm not offended by this proposal, I'm just not sure we need it.
I want to build a client that supports Google-style searches. I want it to
be a simple client. So, it gets the following search from a user: plans
"mission style" lamps. In Google-land, this means that "mission" and
"style" must be adjacent and the document should have the words "plans" and
"lamps" in it somewhere. My simple query parser will look for things in
quotes and pass them as an adjacency word list and strip them out of the
original query. Everything left will be anded together, along with the
adjacency word list. The result is: (("mission style" and "plans") and
"lamps"))
Now, you might say that it wouldn't be much more work to turn "mission
style" into ("mission" adj "style"). I have to agree.
So, if someone wanted to propose that we drop word lists entirely, I guess I
wouldn't fight it.
Ralph
> -----Original Message-----
> From: Mike Taylor [mailto:[log in to unmask]]
> Sent: Wednesday, September 25, 2002 8:48 AM
> To: [log in to unmask]
> Subject: CQL: Expressing Term Structure
>
>
> In my last message, I objected to the proposed array of qualifier
> names that Ray's considering:
>
> titleAllTheseWords subjectAllTheseWords authorAllTheseWords
> titleAnyOfTheseWords subjectAnyOfTheseWords authorAnyOfTheseWords
> titleAdjacentWords subjectAdjacentWords authorAdjacentWords
> titleRelevantWords subjectRelevantWords authorRelevantWords
> titleString subjectString authorString
> ... ... ...
>
> Now the problem here is that the qualifier, which we all sort of
> assumed was the equivalent of an access point, is now being overloaded
> with term-structure information.
>
> What I propose is that instead we overload the relation (normally "=")
> that separates the qualifier from the term. We could introduce five
> new "relations" expressing the kind of matching that we want done on
> the term -- for example:
>
> =* all of these words
> =? any of these words
> =. adjacent words
> =~ relevant words
> == exact string match
>
> So instead of
> titleAllTheseWords="elements style"
> You would search for
> title=*"elements style"
>
> And instead of
> titleRelevantWords="grammar usage language punctuation"
> You would search for
> title=~"grammar usage language punctuation" (normally "=")
>
> Then we would need to figure out which of these is used most often,
> and let that one be what you get if you use plain old
> title="lord of the"
> and it seems to me that the obvious answer is "adjacent words", since
> that's what everyone in the world is used to from goggle, AltaVista
> and suchlike.
>
> The advantages to this approach are at last twofold. First, that of
> brevity (relatively, at least). And second, this is Saying What We
> Mean. Although I earlier couched this suggestion in terms of
> "overloading the relation", actually, I think this stuff _is_
> specifying the relation. I certainly don't see any conflict with the
> other relation operators in CQL -- stuff like "<" for less-than search
> on a numeric, date or other ordered field.
>
> Now that I've finally made a positive suggestion, I'll be interested
> to see how many pieces it gets retributively torn into :-)
>
> _/|_
> _______________________________________________________________
> /o ) \/ Mike Taylor <[log in to unmask]>
www.miketaylor.org.uk
)_v__/\ "I don't think we should all necessarily strive to move
the human spirit -- sometimes, just getting the punctuation
right is achievement enough" -- Adrian Bedford.
|