On 29 November 2010 23:34, Denenberg, Ray wrote:
(second of two messages)

unknown/questionable/uncertain/approximate

These all have distinct meanings.

Different nuances of meaning, if you like. The question to me is, are they different enough to be reliably understood and used? If not (as I suspect) then it would be a bad idea to have them in a spec. The very fact that people question these distinctions should be a caution.

unknown
It means just that, "unknown". So for 199u, the 'u' may be replaced by any single digit.
[Note: we have not settled on 'u' as the "unknown" character. We just haven't found a  better one yet.]

The only reason I can think of for this comes from a damaged record. In any case we can rephrase this as [1990 ... 1999] - one value chosen from the set of values ranging from 1990 to 1999, which has the great advantage of being far more flexible, covering other sources of similar uncertainty, as below.

uncertain
it means "known to be one of a set".  So 2004-[01,02,03] means "January, February, or march of 2004, we don't know which but we know it's one of those".

So what distinguishes something that is unknown from something that is uncertain? It doesn't seem to pass my test. There is a nuance of difference, but I suggest no operational or pragmatic difference, and therefore people would confuse them.

questionable
A strict value, but this value may be wrong.   2004-06? means "it may be June 2004, but then, it might not."

OK, I haven't seen any questioning of this.

approximate
It means just that, "approximate".

There has been discussion about whether and how (1) precision might be assigned to "approximate", and (2) probability might be assigned to "questionable. I hope we agree that qualification is not necessary for "unknown" and "uncertain".

If indeed there is a need to assign precision and probability to approximate and questionable then I propose the following basic approach.

First let's agree, can we,  that most, in fact the vast majority, of approximate and questionable, will not need to assign precision or probability, they will be happy to simply assert that the value is approximate or questionable. Therefore we want an approach that allows this assertion in the simplest possible way without the burden of any complexity imposed by the qualification syntax. In other words if you simply want to assert  june of 2004, approximately" it's '2004-06~' as currently in the draft spec

However, in this supposed vast majority of cases, where unqualified, there seems to me no pragmatic or operational difference between the two. If one says that "questionable" is more vague than "approximate", then one is comparing something like precision, which is contrary (at least in spirit) to the premise that there is no need to assign any precision or probability.

I propose an extension mechanism, whereby whenever a '?' or '~' is encountered, it may be followed by an extension. The extension would be delimited in some fashion, and I am not proposing how at this point, but let's say for now we use ampersand.  Then

2004-06~    means "june of 2004, approximately"
2004-06~&123abc&  means "june of 2004, approximately, with a precision of '123abc'"

So whenever ~ is encountered, if followed by & then a precision extension follows, terminated by the next & (and I have no idea what '123abc' means, someone will have to come up with a framework for these extensions)

In this case, surely we are (operationally and pragmatically, as always) back with something like the  "uncertain" case. Using the central approximate date and the precision, calculate the start and end points of the interval of possibility. Then give it as a value chosen from that interval.

Similarly for questionable, whenever a ? is encountered, if followed by & then a probability extension follows

Perhaps this is too complex for separate standardized treatment. One approach would be to decide what your level of probability is for  effective certainly, and to give the value as a value chosen from the appropriate interval.

I fear that further repetition of these arguments will be futile. However, nothing said so far persuades me away from the view that it would be useful to represent two kinds of uncertainty:
1. the one where there is no estimate of any degree of error, imprecision or uncertainty
2. the one where limits are given. This can be subdivided into two cases:
2(a) a value chosen from a time interval (whether or not using "/")
2(b) a value chosen from a set or range of values (given discretely or using a range notation like "...")
and that other representations add nothing useful.

Simon
--
Simon Grant
+44 7710031657
http://www.simongrant.org/home.html