Print

Print


Excellent proposed work item for phase 2!

> -----Original Message-----
> From: Discussion of the Developing Date/Time Standards
> [mailto:[log in to unmask]] On Behalf Of Saašha Metsärantala
> Sent: Tuesday, June 28, 2011 4:21 PM
> To: [log in to unmask]
> Subject: [DATETIME] Reliability and probabilities
> 
> Hello!
> 
> In Annex A, there is a suggestion to implement reliability in the
> future.
> Hereby, I suggest to modify its syntax and implement it in level 2 now.
> 
> My suggestion focuses on both machine processing and human readability.
> It is also quite comprehensive and allows a granularity enough for most
> needs. Furthermore, its implementation would be really easy.
> 
> As of the EDTF specification, question marks occurring outside URIs are
> never followed by anything else than a tilde, a slash or a dash (which
> possibly could be extended to "T", "Z", and colon) - never a digit.
> 
> In my suggestion, question marks could be followed by one or two digits.
> That makes the BNF expression really easy with a rhs for UASymbol as:
> 
> ( "?" ( "0" | oneThru9 digit? )? | "~" )
> 
> thus we would also skip "?~".
> 
> Let's now focus on the semantics of this suggestion.
> 
> The first or only digit would be an indication of the reliability, with
> a 10-procent granularity. The digit one would represent a reliability
> of 10% and the digit nine would mean 90%. If the reliability is around
> zero, we could remove the whole date. If the reliability is (nearly)
> 100 percent, we could remove the question mark. The semantics of the
> digit zero would be "unknown reliabity".
> 
> When the first digit is non-zero, the optional second digit would
> indicate the statistical dispersion in a more or less Gauss-like pdf
> (not PDF!).
> Roughly, that would be something quite similar to standard deviation.
> More precisely, I suggest that the digit one would mean plus / minus
> ten percent. Likewise, the digit four would mean plus / minus fourty
> percent.
> 
> Of course, we would remain within zero to 100 percent.
> 
> Here, I give some examples:
> 
> 2011?0 - maybe 2011, but with an unknown reliability
> 
> 2011?1 - with approximately 10 percent reliability, the date was 2011,
> that is 10(+undef/-undef)%
> 
> 2011?9 - with approximately 90 percent reliability, the date was 2011,
> that is 90(+undef/-undef)%
> 
> 2011?90 - with approximately 90 percent reliability, the date was 2011,
> that is 90(+0/-0)%
> 
> 2011?21 - with approximately 10 to 30 percent reliability, the date was
> 2011, that is 20(+10/-10)%
> 
> 2011?43 - with approximately 10 to 70 percent reliability, the date was
> 2011, that is 40(+30/-30)%
> 
> 2011?93 - with approximately 60 to 100 percent reliability, the date
> was 2011, that is 90(+10/-30)%, but the reliability is most likely
> around 90%.
> 
> 2011?82 - with approximately 60 to 100 percent reliability, the date
> was 2011, that is 80(+20/-20)%, but the reliability is most likely
> around 80%.
> 
> Both the syntax and the semantics are remarkably straightforward and
> easy to learn and implement. This would also probably cover most use
> cases.
> 
> A further improvement could of course be to introduce a third digit to
> take into account pdf asymetries, but I suggest to introduce the one-
> to-two-digit construction in the level two and leave the three-digit
> version in Annex A for possible future features.
> 
> Regards!
> 
> Saašha,