Print

Print


On Wed, 16 Jul 1997 14:19:52 -0400 Elizabeth H. Dow said:
>  I vaguely remember that there is a wpsgml list, but I don't belong, and
>don't have its address.  I apologize, therefore, for having to bring this
>problem to this group, but you're all I could think of.
>
>  This one stops me cold.  After approximately a year of working with
>WP7.0 doing SGML markup, I find it suddenly won't save files as .sgm
>files without changing some of the characters:
>
>  In one file it changed all the / characters in the eadheader to the &
>form, but did not change them in their /> context. In another file, it saved
>all the < to &#60;
>
>  Any informed opinions or wild guesses out there as to what's causing this?

Word Perfect is just being very cautious.  The characters < and / are
each, in certain contexts, significant to SGML.  The < marks the
beginning of a start-tag, end-tag, comment, or various other markup; the
/ is used in a very specialized form of 'markup minimization' that I
don't want to try to explain.   Suffice to say that if these characters
occur in certain contexts, they can be understood by an SGML parser as
markup, rather than as data.  In other contexts, there is no ambiguity.
I could try to list all the contexts in which each character is data,
and all the contexts where it is markup, but it gets kind of
complicated, and I might get it wrong.

When a character would normally be interpreted as markup, but should in
fact be data, SGML allows the character to be 'escaped' -- replaced by a
'character reference' -- the magic string '&#' followed by the decimal
interpretation of the character's bit string, followed by semicolon.
'/' can thus be replaced by '&#38;' and '<' by '&#60;' -- any SGML
processor will understand the result and interpret it properly.

Strictly speaking, it's only *necessary* to do this in those
contexts where the '<' and '/' would otherwise be misinterpreted.
The developers of Word Perfect appear to have decided (just as I
did a couple paragraphs back) that getting the precise list of
contexts right would be a bit tricky.  And so they replace *all*
occurrences of '/' and '<' with numeric character references, just
to be on the safe side.  This is irritating and confusing to those
of us who then edit the document with an ASCII editor, but it's
something one can get used to, and over time I've grown to be
rather fond of it, as a sign that Word Perfect is doing its best to
be cautious and reliable.  I work with other editors that *never*
replace '<' and '/' with numeric character references, even when
the result is a misunderstanding and an invalid document.  On the
whole, I prefer Word Perfect's cautious approach.

No data has been lost, and no harm is done if you process the result
with conforming SGML software.  So don't panic.


-C. M. Sperberg-McQueen
 ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago
 [log in to unmask]