Hello!
> > "xs:string", [...] anyURI
> it is premature to try to work out these sort of details.
Rejecting anyURI does not mean that "xs:string" is what we really want. I
consider that it is important to be aware of the limitations and
ambiguities of xs:string and that what is a feature in some context may be
a limitation in some other context. I will try to clarify some of the
characteristics of "xs:string".
As of XML Schema 1.0
http://www.w3.org/TR/xmlschema-2/#string the concept of xs:string is
clarified:
"The value space of string is the set of finite-length sequences of
characters (as defined in [XML 1.0 (Second Edition)]) that match the Char
production from [XML 1.0 (Second Edition)]."
And there is a link to this production:
http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Char
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
This production is also the same in the newest version of the XML 1.0
recommendation
http://www.w3.org/TR/REC-xml/#NT-Char
As of XML Schema 1.1 at
http://www.w3.org/TR/xmlschema11-2/#string refers to
http://www.w3.org/TR/xml11/#NT-Char showing another production
Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
This gives many possibilities. The list of characters that are
"discouraged" is also different in XML Schema 1.0 and XML Schema 1.1.
Something important is that
http://www.w3.org/TR/xmlschema11-2/#string also reads:
"string, as a simple type [...] is often not suitable for representing text."
The reasons for that are many. I won't give all details here, but only a
couple of facts. Both in XML Schema 1.0 and in XML Schema 1.1, xs:string
may contain #xA and #xD among other characters. It may also contain
sequences of spaces. I mean that sequences of spaces within the lexical
space are kept intact in the value space. This interesting feature may not
be what we want: Consecutive spaces are considered significant and two
occurrences of xs:string are considered different, even if the only
difference is one trailing space in one of them. My question here is
whether this really is what we want ...
Furthermore, we must be aware that EDTF will probably be quite often used
within XML and that XML 1.0 and XML 1.1 (Do NOT confuse with XML SCHEMA
1.0 and 1.1!) have different definitions about highly relevant concepts.
For example, they have different definitions of what should be considered
a line-break (such as #x85 and #x2028), and line-breaks (I do not further
define this fuzzy term) are highly relevant when it comes to xs:string.
I wonder what about using xs:token instead, as of
http://www.w3.org/TR/xmlschema-2/#token and
http://www.w3.org/TR/xmlschema11-2/#token where #xA and #xD are considered
#x20, all consecutive spaces are collapsed and trailing spaces are removed
when the value space is created from the lexical space.
Regards!
SaaĊĦha,
|