Excellent proposed work item for phase 2! > -----Original Message----- > From: Discussion of the Developing Date/Time Standards > [mailto:[log in to unmask]] On Behalf Of Saašha Metsärantala > Sent: Tuesday, June 28, 2011 4:21 PM > To: [log in to unmask] > Subject: [DATETIME] Reliability and probabilities > > Hello! > > In Annex A, there is a suggestion to implement reliability in the > future. > Hereby, I suggest to modify its syntax and implement it in level 2 now. > > My suggestion focuses on both machine processing and human readability. > It is also quite comprehensive and allows a granularity enough for most > needs. Furthermore, its implementation would be really easy. > > As of the EDTF specification, question marks occurring outside URIs are > never followed by anything else than a tilde, a slash or a dash (which > possibly could be extended to "T", "Z", and colon) - never a digit. > > In my suggestion, question marks could be followed by one or two digits. > That makes the BNF expression really easy with a rhs for UASymbol as: > > ( "?" ( "0" | oneThru9 digit? )? | "~" ) > > thus we would also skip "?~". > > Let's now focus on the semantics of this suggestion. > > The first or only digit would be an indication of the reliability, with > a 10-procent granularity. The digit one would represent a reliability > of 10% and the digit nine would mean 90%. If the reliability is around > zero, we could remove the whole date. If the reliability is (nearly) > 100 percent, we could remove the question mark. The semantics of the > digit zero would be "unknown reliabity". > > When the first digit is non-zero, the optional second digit would > indicate the statistical dispersion in a more or less Gauss-like pdf > (not PDF!). > Roughly, that would be something quite similar to standard deviation. > More precisely, I suggest that the digit one would mean plus / minus > ten percent. Likewise, the digit four would mean plus / minus fourty > percent. > > Of course, we would remain within zero to 100 percent. > > Here, I give some examples: > > 2011?0 - maybe 2011, but with an unknown reliability > > 2011?1 - with approximately 10 percent reliability, the date was 2011, > that is 10(+undef/-undef)% > > 2011?9 - with approximately 90 percent reliability, the date was 2011, > that is 90(+undef/-undef)% > > 2011?90 - with approximately 90 percent reliability, the date was 2011, > that is 90(+0/-0)% > > 2011?21 - with approximately 10 to 30 percent reliability, the date was > 2011, that is 20(+10/-10)% > > 2011?43 - with approximately 10 to 70 percent reliability, the date was > 2011, that is 40(+30/-30)% > > 2011?93 - with approximately 60 to 100 percent reliability, the date > was 2011, that is 90(+10/-30)%, but the reliability is most likely > around 90%. > > 2011?82 - with approximately 60 to 100 percent reliability, the date > was 2011, that is 80(+20/-20)%, but the reliability is most likely > around 80%. > > Both the syntax and the semantics are remarkably straightforward and > easy to learn and implement. This would also probably cover most use > cases. > > A further improvement could of course be to introduce a third digit to > take into account pdf asymetries, but I suggest to introduce the one- > to-two-digit construction in the level two and leave the three-digit > version in Annex A for possible future features. > > Regards! > > Saašha,