Print

Print


Hello!

In Annex A, there is a suggestion to implement reliability in the future. 
Hereby, I suggest to modify its syntax and implement it in level 2 now.

My suggestion focuses on both machine processing and human readability. It 
is also quite comprehensive and allows a granularity enough for most 
needs. Furthermore, its implementation would be really easy.

As of the EDTF specification, question marks occurring outside URIs are 
never followed by anything else than a tilde, a slash or a dash (which 
possibly could be extended to "T", "Z", and colon) - never a digit.

In my suggestion, question marks could be followed by one or two digits. 
That makes the BNF expression really easy with a rhs for UASymbol as:

( "?" ( "0" | oneThru9 digit? )? | "~" )

thus we would also skip "?~".

Let's now focus on the semantics of this suggestion.

The first or only digit would be an indication of the reliability, with a 
10-procent granularity. The digit one would represent a reliability of 10% 
and the digit nine would mean 90%. If the reliability is around zero, 
we could remove the whole date. If the reliability is (nearly) 100 
percent, we could remove the question mark. The semantics of the digit 
zero would be "unknown reliabity".

When the first digit is non-zero, the optional second digit would indicate 
the statistical dispersion in a more or less Gauss-like pdf (not PDF!). 
Roughly, that would be something quite similar to standard deviation. More 
precisely, I suggest that the digit one would mean plus / minus ten 
percent. Likewise, the digit four would mean plus / minus fourty percent.

Of course, we would remain within zero to 100 percent.

Here, I give some examples:

2011?0 - maybe 2011, but with an unknown reliability

2011?1 - with approximately 10 percent reliability, the date was 2011, 
that is 10(+undef/-undef)%

2011?9 - with approximately 90 percent reliability, the date was 2011, 
that is 90(+undef/-undef)%

2011?90 - with approximately 90 percent reliability, the date was 2011, 
that is 90(+0/-0)%

2011?21 - with approximately 10 to 30 percent reliability, the date was 
2011, that is 20(+10/-10)%

2011?43 - with approximately 10 to 70 percent reliability, the date was 
2011, that is 40(+30/-30)%

2011?93 - with approximately 60 to 100 percent reliability, the date was 
2011, that is 90(+10/-30)%, but the reliability is most likely around 90%.

2011?82 - with approximately 60 to 100 percent reliability, the date was 
2011, that is 80(+20/-20)%, but the reliability is most likely around 80%.

Both the syntax and the semantics are remarkably straightforward and easy 
to learn and implement. This would also probably cover most use cases.

A further improvement could of course be to introduce a third digit to 
take into account pdf asymetries, but I suggest to introduce the 
one-to-two-digit construction in the level two and leave the three-digit 
version in Annex A for possible future features.

Regards!

SaaĊĦha,