Candy Zemon (hi Candy!!) wrote on 08/05/2005 06:38:01 AM:
> I actually do have some technical objections to the XML entity
> proposal from some of our development staff. I agree completely that
> the conceptual and practical tasks are as Geoff lists except for
> step 4 which I see as one option. Here are the objections I heard
> when I asked about using XML entities for Unicode characters in
> MARC-8 record output:
>
> 1. MARC-8 encoded text fully complies with the ISO 2022 standard
> which includes an escape sequence mechanism to reach non ASCII/ANSEL
> graphic sets (e.g. Hebrew, EACC). There is no way to add support
> for XML-based character entity references and still be compliant
> with ISO 2022.
With XML entity references, the character set in the MARC record is still
fully compliant with ISO 2022. That is the whole point of using entity
references. The string does not contain unrecognizable characters; the
string contains sequences of ASCII characters that an appropriate rendering
engine can display as weird and wonderful non-ASCII things. That is one of
the major attractions of this approach - it does not break the MARC record.
>
> 2. There would be no way to encode a literal string that looked like
> an XML-based character entity reference without some unspecified
> escape mechanism.
The reference itself is the escape mechanism. It is important to note that
this suggestion does not require or even presuppose that implementors of
MARC-8 systems might change their applications to support entity
references. The crucial thing is that MARC-8 systems should be able to
process records originating in Unicode systems without breaking. I think
we all agree that any solution that requires MARC-8 systems to do
development work is going to be untenable - if any developer is going to do
character set work, surely they are going to do it in Unicode, not in
MARC-8. The only thing to do with MARC-8 is to find a way of
grandfathering it.
>
> 3. Software that converts from MARC-8 to Unicode would be
> significantly more complex (and slower due to the unconventional
> logic that would be required).
Only if there is no XML conversion in the path from MARC-8 to Unicode. If
there is XML conversion, then the XML parser does this (entity
dereferencing) at no development cost. Unless, of course, you do something
extravagant like writing your own XML parser.
>
> I also heard the suggestion that any solution to support the full
> Unicode character set in MARC-8 should be within the scope of ISO
> 2022. One way would be to register a new graphic set for the
> purpose of escaping to and from UTF-8 or UTF-16. Another would be
> to use the more generic SHIFT-IN and SHIFT-OUT control characters to
> signal when the encoding should change to and from UTF-8 or UTF-16.
Surely no one wants to do this? This requires software development in
MARC-8 AND in Unicode systems. I think we need to grandfather ISO 2022
along with MARC-8.
<snip>
Johan Zeeman
RLG
|