Quoting Edward Summers <[log in to unmask]>:
> On Jan 26, 2007, at 8:28 AM, Edward C. Zimmermann wrote:
> > - Secondly, XHTML is not XML.
>
> I don't want to take this too far out of context, but why do you say
> XHTML isn't XML?
XHTML is not the same thing as XML and that XHTML is not even a well-tamed
conforming application of XML.
XHTML is also an application and contains a lot of application level
stuff as well as semantics etc. and some of it is defined outside of XML.
>
> XHTML documents are XML conforming. As such, they are readily
While a well-formed XHTML document is well-formed XML and as such XML
conforming, one can't define XHTML in XML, viz. I can define a well
founded XML based around XHTML that is not well founded XHTML.
The work into XHTML was initially work into a SGML based HTML. XML
is not SGML. Since XML was envisioned as NOT a replacement for SGML but
as a formalization of the simplified tag and entity normalized SGML many
of us were using in our projects and applications we assumed that there
was SGML behind the scenes to define things. Our intent was not to replace
SGML but to create an easy entry path. Fundamental was the notion that a
well-formed document should be parseable in and of itself, that is, without
its DTD. Without DTDs, however, there is no way to define exclusions so XML
does not (and can't) have them.
Look at the Anchor tag: A
<A HREF="http://www.nonmonotonic.net/>BSn's NONMONOTONIC Lab</A>
is fine but while
<A HREF="http://www.nonmonotonic.net/><A HREF="http://www.bsn.de/">BSn</A>'s
NONMONOTONIC Lab</A> might be fine XML its clearly NOT good XHTML.
Anchors in XHTML are not to be nested.
There is a lot more.. even in the way documents are parsed.. Look at
<script> sections.
> viewed, edited, and
> validated with standard XML tools. [1]
Actually I can define bad XHTML (just as I can define bad RSS) that passes
as valid by the popular XML tools.
-- I can't stress often enough: the Web is a mess. I mention RSS since I'm
sucking and parsing a lot of RSS in http://www.ibu.de and the vast majority
of ALL RSS I'm looking at don't conform to the standard they claim to conform
to. Show me a BLOG or so-called CMS system that produces RSS and I'll show you
invalid RSS that passes through as valid through all the standard RSS validation
software.
Back to order...
Imagine a document structure
Person
\
Name
\ \
Last First
What's the difference in search context between
<PERSON><NAME><LAST>Zimmermann</LAST><FIRST>Edward</FIRST></NAME></PERSON>
and
<PERSON><NAME><FIRST>Edward</FIRST><LAST>Zimmermann</LAST></NAME></PERSON>
beyond the order of mark-up? Is one right and the other wrong?
Don't
<LAST>Zimmermann</LAST><FIRST>Edward</FIRST>
and
<FIRST>Edward</FIRST><LAST>Zimmermann</LAST>
probably define the exact same data in a RBMS? They are both just
the product of different report templates to generate an XML record.
Since SRU/W is also about interfacing to an RDBMS without XML then
how can we talk about Zimmermann being before or (ignoring tags)
adjacent to Edward?
If we want to talk about XHTML how about
<META NAME="DESCRIPTION" CONTENT="An example">
What's the difference between it and
<META CONTENT="An example" NAME="DESCRIPTION">
The standard says that order (and even process) does not matter.
In HTML/XHTMLi we, I think, all agree that it would be wrong to
attach a semantic meaning.
The META elements don't enter into the application rendering side of
things so let me ask:
What's the difference between the Dublin core
<meta name="DC.subject" xml:lang="de" content="Meeresfruechte" />
<meta name="DC.subject" xml:lang="en-GB" content="seafood" />
<meta name="DC.subject" xml:lang="fr" content="fruits de mer" />
(I lexi-ordered the languages by their locales)
<meta name="DC.subject" xml:lang="en-GB" content="seafood" />
<meta name="DC.subject" xml:lang="de" content="Meeresfruechte" />
<meta name="DC.subject" xml:lang="fr" content="fruits de mer" />
(I ordered them by my preferences)
Does the order of the META siblings in XHTML mean anything?
I can understand the utility and want (sometimes) of "document order". That's
why I support search by byte offsets within the original document (whatever
format the input was as long as its a document). Critical, however, for me
is the utility to search (also ordered) for "terms" in the same field (tag)
instance (named, named with path or unnamed). Searching for "out" and "spot"
in the same "line" of one of Shakespeare's plays is, after all, not the
same question as to find records where a line contains "out" and a line
contains "spot".
That search (using each play as a record) would produce the set:
`The Life and Death of King John'
`The Tragedy of Julius Caesar'
`The Tragedy of Macbeth'
`The Tragedy of Coriolanus'
`As You Like It'
`The Merry Wives of Windsor'
`The Tragedy of Antony and Cleopatra'
(with 55 hits alone in the first record)
while the former produces just one play:
`The Tragedy of Macbeth'
and the line: "Out, damned spot! out, I say!--One: two: why"
as spoken by LADY MACBETH in ACT V SCENE I.
--
--
Edward C. Zimmermann, Basis Systeme netzwerk, Munich
Office Leo (R&D):
Leopoldstrasse 53-55, D-80802 Munich,
Federal Republic of Germany
http://www.nonmonotonic.net
|