On Mon, 17 Nov 2008 11:13:28 -0500, Ray Denenberg, Library of Congress wrote > Based on discussion the past several years over the topic of result size precision, the OASIS Search Web Service Technical Committee proposes to define in SRU 2.0 an optional response element <resultSizePrecision> whose definition would be something like: > Confidence as a 0-100 integer does not makes any sense. What does 30 mean? 30% confidence? Confidence in what? What does 10% or 90% mean? We can debate what these numbers mean in statistics or in empirical data quality but here its screaming "wrong model". Servers don't have typically any kind of "confidence" beyond "know" and "don't know". Between these two poles its more or less "what I think at this moment but I know its probably not right but could be..".. > "The server's confidence in the precision of the result count reported. A non-negative integer from zero to 100. A value of zero means the server has no idea what the size of the result set is. '100' means the server guarantees that the value of result count is accurate. A value in between means the result count is an estimate, where a higher value means that the server has more confidence in the precision than a lower value." > > [Note: the committee debated an alternative, where there would be three values: 'exact', 'estimate', 'no idea'. However, the committee felt that might be inflexible, and there might eventually be implementors who would want four levels, five, etc. With the zero to 100 approach, a convention could be recommended to use zero, 50, and 100 for the three-level representation.] > "four levels, five etc." seems like private conversations to me.. We can, of course, think about a model and define a few different "fits".. Beyond "exact", "no idea" are a few levels of "estimates".. Like "feeling good estimate", "feeling not too good estimate", "best estimate".. But we're still missing some important states.. like "volatile".. Or "minimum" (at least this many) or maximum (probably no more than..)... Nothing quantitative much less "linear" here... A real application: We've been doing some work in distributed p2p search networks.. and here the longer a search runs the larger the set can perhaps be but it can also shrink as we dynamically adjust the granularity of information.. converging upon some size in unknown time--- the search is given fuel and like a motorcar can be re-fueled.. Now.. we have "an idea".. we don't know the limit but at any given moment we have a certain size of the set we have at that moment.. which is, of course, a different set the next bat of an eye.. What I'm suggesting is instead of this pseudo analytical 0-100 stuff we have nice qualitative words as a minimum public vocabulary such as as "exact", "unknown", "minimum" (its at least this many), "maximum" (its no more than this) etc. and allow for "personal extensions" as any term other than these (or whatever magic words we define).. together with a controlled core list of modalities.. (such as shall be, is, was etc.) Clients would only "need to" understand the 0 (don't know) and 1 (exact).. but could grasp more.. > That's the server side, comments are welcome. > > At the other end is the client side. Should the client indicate that it does or does not care about result size precision? It might want 10 records, any 10, and beyond that it doesn't care if there are 10 or 10 billion, and it does not want the client to bother to even try to determine or estimate the result size, as that may be an expensive process. > > The TC is inclined not to address this, the client end, unless someone can cite a real requirement (not just "it seems useful"). So we are soliciting feedback on this question from SRU implementors. Can someone assert that if a request parameter were to be defined pertaining to result size precision, you would implement it? > > Ray Denenberg > > -- Edward C. Zimmermann, NONMONOTONIC LAB Basis Systeme netzwerk, Munich Ges. des buergerl. Rechts Office Leo (R&D): Leopoldstrasse 53-55, D-80802 Munich, Federal Republic of Germany http://www.nonmonotonic.net Umsatz-St-ID: DE130492967