Print

Print


Hello!

> > "xs:string", [...] anyURI
> it is premature to try to work out these sort of details.
Rejecting anyURI does not mean that "xs:string" is what we really want. I 
consider that it is important to be aware of the limitations and 
ambiguities of xs:string and that what is a feature in some context may be 
a limitation in some other context. I will try to clarify some of the 
characteristics of "xs:string".

As of XML Schema 1.0
http://www.w3.org/TR/xmlschema-2/#string the concept of xs:string is 
clarified:

"The value space of string is the set of finite-length sequences of 
characters (as defined in [XML 1.0 (Second Edition)]) that match the Char 
production from [XML 1.0 (Second Edition)]."

And there is a link to this production:
http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Char

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

This production is also the same in the newest version of the XML 1.0 
recommendation
http://www.w3.org/TR/REC-xml/#NT-Char

As of XML Schema 1.1 at
http://www.w3.org/TR/xmlschema11-2/#string refers to
http://www.w3.org/TR/xml11/#NT-Char showing another production

Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

This gives many possibilities. The list of characters that are 
"discouraged" is also different in XML Schema 1.0 and XML Schema 1.1.

Something important is that
http://www.w3.org/TR/xmlschema11-2/#string also reads:

"string, as a simple type [...] is often not suitable for representing text."

The reasons for that are many. I won't give all details here, but only a 
couple of facts. Both in XML Schema 1.0 and in XML Schema 1.1, xs:string 
may contain #xA and #xD among other characters. It may also contain 
sequences of spaces. I mean that sequences of spaces within the lexical 
space are kept intact in the value space. This interesting feature may not 
be what we want: Consecutive spaces are considered significant and two 
occurrences of xs:string are considered different, even if the only 
difference is one trailing space in one of them. My question here is 
whether this really is what we want ...

Furthermore, we must be aware that EDTF will probably be quite often used 
within XML and that XML 1.0 and XML 1.1 (Do NOT confuse with XML SCHEMA 
1.0 and 1.1!) have different definitions about highly relevant concepts. 
For example, they have different definitions of what should be considered 
a line-break (such as #x85 and #x2028), and line-breaks (I do not further 
define this fuzzy term) are highly relevant when it comes to xs:string.

I wonder what about using xs:token instead, as of
http://www.w3.org/TR/xmlschema-2/#token and
http://www.w3.org/TR/xmlschema11-2/#token where #xA and #xD are considered 
#x20, all consecutive spaces are collapsed and trailing spaces are removed 
when the value space is created from the lexical space.

Regards!

SaaĊĦha,