On Mon, 17 Nov 2008 11:13:28 -0500, Ray Denenberg, Library of Congress wrote
> Based on discussion the past several years over the topic of result size
precision, the OASIS Search Web Service Technical Committee proposes to define
in SRU 2.0 an optional response element <resultSizePrecision> whose definition
would be something like:
Confidence as a 0-100 integer does not makes any sense. What does 30 mean?
30% confidence? Confidence in what? What does 10% or 90% mean? We can
debate what these numbers mean in statistics or in empirical data quality
but here its screaming "wrong model".
Servers don't have typically any kind of "confidence" beyond "know" and
"don't know". Between these two poles its more or less "what I think at
this moment but I know its probably not right but could be.."..
> "The server's confidence in the precision of the result count reported. A
non-negative integer from zero to 100. A value of zero means the server has no
idea what the size of the result set is. '100' means the server guarantees
that the value of result count is accurate. A value in between means the
result count is an estimate, where a higher value means that the server has
more confidence in the precision than a lower value."
> [Note: the committee debated an alternative, where there would be three
values: 'exact', 'estimate', 'no idea'. However, the committee felt that
might be inflexible, and there might eventually be implementors who would want
four levels, five, etc. With the zero to 100 approach, a convention could be
recommended to use zero, 50, and 100 for the three-level representation.]
"four levels, five etc." seems like private conversations to me..
We can, of course, think about a model and define a few different "fits"..
Beyond "exact", "no idea" are a few levels of "estimates".. Like "feeling
good estimate", "feeling not too good estimate", "best estimate"..
But we're still missing some important states.. like "volatile"..
Or "minimum" (at least this many) or maximum (probably no more than..)...
Nothing quantitative much less "linear" here...
A real application:
We've been doing some work in distributed p2p search networks.. and here
the longer a search runs the larger the set can perhaps be but it can also
shrink as we dynamically adjust the granularity of information.. converging
upon some size in unknown time--- the search is given fuel and like a motorcar
can be re-fueled.. Now.. we have "an idea".. we don't know the limit but at
any given moment we have a certain size of the set we have at that moment..
which is, of course, a different set the next bat of an eye..
What I'm suggesting is instead of this pseudo analytical 0-100 stuff we
have nice qualitative words as a minimum public vocabulary such as as
"exact", "unknown", "minimum" (its at least this many), "maximum" (its no
more than this) etc. and allow for "personal extensions" as any term other
than these (or whatever magic words we define).. together with a
controlled core list of modalities.. (such as shall be, is, was etc.)
Clients would only "need to" understand the 0 (don't know) and 1 (exact)..
but could grasp more..
> That's the server side, comments are welcome.
> At the other end is the client side. Should the client indicate that it does
or does not care about result size precision? It might want 10 records, any
10, and beyond that it doesn't care if there are 10 or 10 billion, and it does
not want the client to bother to even try to determine or estimate the result
size, as that may be an expensive process.
> The TC is inclined not to address this, the client end, unless someone can
cite a real requirement (not just "it seems useful"). So we are soliciting
feedback on this question from SRU implementors. Can someone assert that if
a request parameter were to be defined pertaining to result size precision,
you would implement it?
> Ray Denenberg
Edward C. Zimmermann, NONMONOTONIC LAB
Basis Systeme netzwerk, Munich Ges. des buergerl. Rechts
Office Leo (R&D):
Leopoldstrasse 53-55, D-80802 Munich,
Federal Republic of Germany