Comments on "EDTF Specification DRAFT FOR REVIEW" of 5 November 2010
[Sorry to have missed the 10 January deadline for comments; the
list server bounced my mail.]
The document is explicitly intended "for review", but is less explicit
about who is intended to review it, or with what criteria in mind. The
following comments are offered on the assumption that public review is
allowed, expected, or hoped for, and that essentially all aspects of
the document are legitimate topics for review. If these assumptions
are not those of the group preparing the document, you will wish to
read the following comments with the necessary mental reservations.
Since you use the W3C's schema language XSD as a point of reference, I
may talk from time to time about aspects of XSD's date/time datatypes.
(Full disclosure: I co-chaired the W3C working group that developed
XSD 1.0 and have served since 2004 as the lead editor of XSD 1.1. In
the comments here I speak only for myself, and not for the W3C XML
Schema Working Group, for the World Wide Web Consortium [W3C], or for
any members of the W3C.)
1 The document is explicit, and short, and to the point. Some of the
proposals look rather ingenious. I thank you for the opportunity to
2 Plans for the future
The document itself says nothing about what is to happen to the
proposal when completed.
The related document headed "Problem, Requirements,and Basic
Approach", at http://www.loc.gov/standards/datetime/requirements.html
(which I take to be a related document and not formally part of the
EDTF document), says it may be proposed to ISO for incorporation in
some future version of ISO 8601, and/or to W3C for definition as an
XSD datatype. These plans make sense to me (although I make no
prediction on the likelihood that either ISO or W3C will agree); it
would be helpful if this information were integrated into the main
Of course, there is no need for W3C action in order to have an XSD
datatype for the date/time expressions defined here; anyone can define
XSD datatypes and publish them without W3C involvement. My experience
is that writing regular-expression patterns which accept all and only
legitimate Gregorian dates can be a little tricky, but it's certainly
feasible. I'll be happy to help if I can.
3 What problem is being solved here? What are the requirements?
The goals and requirements of the work are not clear (to this reader,
at least) from the document. The section 'Background' begins with the
No standard date/time format meets the needs of XML metadata
But the document does not seem to provide any list of what its authors
believe the needs of XML metadata schemas are, or why no existing
standard date/time format meets them.
The Web page "Problem, Requirements,and Basic Approach", at
http://www.loc.gov/standards/datetime/requirements.html, does include
the heading "Requirements", but the text beneath that heading just
lists a number of concrete proposals for functionality and syntax. It
does not identify user-level requirements rooted in a particular
application domain. Without a better sense of what needs must be met
by the EDTF format, it's difficult to say whether the current document
is meeting them or not.
At the risk of flogging a dead horse, perhaps a few concrete examples
may help clarify the issue I'm raising. The "Problem, Requirements,and
Basic Approach" page says, by way of elaborating on the proposition
that xsd:date, xsd:time, and xsd:dateTime are inadequate:
The string 2001-02-03, for example, is a valid xs:date value, but
20010203 (without hyphens) is not, even though it is a valid ISO
8601  date. This is a choice that W3C made when defining
xs:date - the hyphenated form was chosen and the non-hyphenated
All of this is true (or would be true if "is" were replaced by
"denotes", see comment 9 below), but it does not on its face present
an argument leading to the conclusion that the XSD datatypes are
inadequate. The string "10 January 2011" is also not a valid xsd:date,
nor is the string "possibly late in the reign of Diocletian?". The
number 3.141592 is also not a valid xsd:date. Nu?
To make a requirement for defining a hyphenless form of date, you need
(or so it seems to me) to identify something that can be accomplished
with such a definition, that is impossible otherwise. Something at the
domain level, I mean, something other than "representing dates in a
form without hyphens". And, given that the opening statement of the
problem refers to XML-based metadata vocabularies, the something
should probably be related, somehow, to existing or proposed metadata
The "Problem, Requirements,and Basic Approach" page says further
Many dates are coded in database records without hyphens
(conformant with ISO 8601). When extracting a date from a database
record to insert into an XML record, some implementors feel it is
an unnecessary burden to have to insert hyphens.
This seems a less than compelling argument. On the scale of
format-conversion difficulties, trivial string manipulations like this
one hardly register compared to other challenges caused by mismatches
in how the information is modeled. (And in the SQL database management
systems I'm familiar with, it would not be correct to say that dates
are stored without, or with, hyphens; in all current implementations
of SQL dates are as far as I know stored in compact binary forms and
translated to character strings only upon export or display.)
The eleven specific items listed under "Requirements" don't have any
overt reference to metadata vocabularies, though it seems clear that
metadata vocabularies will need to record publication dates (for
example) which are uncertain, questionable, or for which only one
end-point of a range is known. I think the document would be stronger
if these items were motivated by concrete examples rooted in the
For almost all of the items in the features table, the question arises
"when and how is this form needed in XML-based metadata formats?"
A few examples may be worth calling out.
205 Year and ordinal day. Why is this needed? If the requirement is
to record a particular date, that date can be recorded in yyyy-mm-dd
form, no? When does the requirement arise to record it in ordinal
207 Week date. Same question; when and why is it a requirement to
record a date using this notation rather than the yyyy-mm-dd notation?
4 Why a single datatype?
Is it a requirement that all of these formats be defined as lexical
forms for a single datatype?
Is an interval really the same as a duration the same as a date the
same as a time the same as a date-time pair?
5 Why an atomic datatype?
Implicitly, the document seems to take for granted that information in
all of these forms should be representable in what XSD refers to as a
simple type. But why?
At least some of the forms given (including ranges, intervals with
known start and end, intervals with known startpoint and duration,
intervals with known duration and endpoint, multiple dates,
questionable or uncertain dates, and dates with fragmentary
specification) seem to me more naturally and simply represented as
compound information, rather than as atomic information.
If the expected application domain is XML-based metadata standards,
why not use the usual methods for representing compound information?
The Text Encoding Initiative's 'date' element, for example, can
indicate uncertainty (at various levels), closed and semi-open
intervals, and so on. What is the domain-specific requirement for
handling all of this kind of information in a single atomic type
instead of factoring it into several and allowing them to be combined
in orthogonal ways?
6 What operations are expected?
The document seems to aiming at defining something like a dataype, but
it says nothing about the operations which ought to be available on
the values of the type.
When sorting by values of this type, how should questionable,
approximate, and uncertain dates be handled? How does a time like
22:00:22 compare to a date like 2011W02?
A common operation on dates in some systems is to perform date
arithmetic: given two dates, calculate the duration of the interval
separating them; given a date, add or subtract a duration from it to
find a new date. How should partial, uncertain, questionable and
approximate dates behave under these and similar operations of
date/time arithmetic? And intervals?
7 The year zero
Entry 110 reads in part
BC has no year zero, In the BC system the year before year 1 is 1
BC. Thus '-0999' means "1000 BC".
By "BC" and "the BC system" I think you must mean "the Gregorian
calendar". The absence of a year zero applies not just to the time
before the era, but also after.
8 BC and BCE
A rhetorical point: the note on "Use of BC/AD vs. BCE/CE" seems to me
to miss the point at issue rather badly. Those made uncomfortable by
the abbreviation "BC" are made uncomfortable by the logical
entailments and theological implications of referring to Jesus as
"Christ" quite independently of what you or any authors do or don't
intend by way of religious significance. The phrase "Before Christ"
will be associated with "BC" whether you wish it to or not, and it has
religious significance whether you intend it or not. So the anodyne
claim "It is not an acronym and is not intended to have religious
significance" does not seem to address any of the actual or perceived
issues with the usage you adopt.
The note says
BC is used rather than BCE, because the latter, which means
"Before Common Era" seems more controversial, because of the
difficulty achieving consensus on what is meant by the "common
But what controversy attaches to identifying the era? Are the terms
"Common Era" and "Before the Common Era" used to refer to any calendar
other than the Gregorian calendary now used worldwide in commerce? As
far as I can tell, the era of the Gregorian calendar has a much
clearer and stronger claim to be referred to as "the common" one than
any other. (Web sites that use "BCE" and "CE" do report controversy
over their usage, but it's not from proponents of other calendars.)
If you need to clarify and defend your choice of terminology, I hope
you can do it more plausible and persuasive terms. Perhaps
Use of BC rather than BCE. We use "BC" rather than "BCE" to refer to
dates before the year 0001, because it appears to be the most
widespread usage. No religious significance should be imputed
to this choice.
9 XSD terminology
(Borderline pedantic; read only if interested in terminological details.)
Strictly speaking, the sentence
The string 2001-02-03, for example, is a valid xs:date value, but
20010203 (without hyphens) is not
is incorrect. XSD datatypes possess a lexical space, a value space,
and a mapping from the one to the other. No string can EVER be a valid
xsd:date value, because the value space of xsd:date is (in XSD 1.0) a
set of segments on the conventional time line of earth's history, or
(in XSD 1.1) a set of septuples whose component parts are integers and
whatnot. You mean, I think, to say that the string "2001-02-03"
denotes an xsd:date value, and the string "20010203" does not, or
(equivalently) that the one string but not the other is a member of
the lexical space of xsd:date.
10 Ambiguity, orthogonality, etc.
It's not clear whether the intention is that suffixes like '?' and '~'
are intended to be orthogonal to the form used on their left (so that
they can follow both hyphenated and unhyphenated forms); perhaps this
is just careless reading on my part. (I have found it hard to focus on
details given that I don't understand the use cases or requirements.)
If they not intended to be orthogonal, I suggest that they ought to be.
And ISO 8601 sets a heroic example of allowing a multiplicity of forms
while ensuring that any form conforming to the spec is unambiguous. It
looks at first glance as if allowing hyphen to separate the start and
end points of a range or interval (as shown in 317 and 319) is safe,
because the hyphen (or double hyphen, in 319) is followed by four
digits, not two or three. If you have done the work necessary to
guarantee that you are preserving 8601's lack of ambiguity, it might
be worth saying so in the prose; if you haven't done the work, it
would be useful to do it.
It is natural, I suppose, to focus in reviewing a document of this
kind on the perceived problems rather than on those parts of the
document that are clear, or helpful, or non-problematic. I hope my
notes have not seemed too negative. Again, thank you for the public
invitation to review this document; I hope these comments are helpful.
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC