LISTSERV mailing list manager LISTSERV 16.0

Help for UNICODE-MARC Archives


UNICODE-MARC Archives

UNICODE-MARC Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UNICODE-MARC Home

UNICODE-MARC Home

UNICODE-MARC  August 2005

UNICODE-MARC August 2005

Subject:

Re: Topic 1, Representing Extended Unicode in MARC-8

From:

Johan Zeeman <[log in to unmask]>

Reply-To:

UNICODE-MARC Discussion List <[log in to unmask]>

Date:

Fri, 5 Aug 2005 11:38:12 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (64 lines)

Candy Zemon (hi Candy!!) wrote on 08/05/2005 06:38:01 AM:

> I actually do have some technical objections to the XML entity
> proposal from some of our development staff. I agree completely that
> the conceptual and practical tasks are as Geoff lists except for
> step 4 which I see as one option. Here are the objections I heard
> when I asked about using XML entities for Unicode characters in
> MARC-8 record output:
>
> 1. MARC-8 encoded text fully complies with the ISO 2022 standard
> which includes an escape sequence mechanism to reach non ASCII/ANSEL
> graphic sets (e.g. Hebrew, EACC).  There is no way to add support
> for XML-based character entity references and still be compliant
> with ISO 2022.

With XML entity references, the character set in the MARC record is still
fully compliant with ISO 2022.  That is the whole point of using entity
references.  The string does not contain unrecognizable characters; the
string contains sequences of ASCII characters that an appropriate rendering
engine can display as weird and wonderful non-ASCII things.  That is one of
the major attractions of this approach - it does not break the MARC record.

>
> 2. There would be no way to encode a literal string that looked like
> an XML-based character entity reference without some unspecified
> escape mechanism.

The reference itself is the escape mechanism.  It is important to note that
this suggestion does not require or even presuppose that implementors of
MARC-8 systems might change their applications to support entity
references.  The crucial thing is that MARC-8 systems should be able to
process records originating in Unicode systems without breaking.  I think
we all agree that any solution that requires MARC-8 systems to do
development work is going to be untenable - if any developer is going to do
character set work, surely they are going to do it in Unicode, not in
MARC-8.   The only thing to do with MARC-8 is to find a way of
grandfathering it.
>
> 3. Software that converts from MARC-8 to Unicode would be
> significantly more complex (and slower due to the unconventional
> logic that would be required).

Only if there is no XML conversion in the path from MARC-8 to Unicode.  If
there is XML conversion, then the XML parser does this (entity
dereferencing) at no development cost.  Unless, of course, you do something
extravagant like writing your own XML parser.

>
> I also heard the suggestion that any solution to support the full
> Unicode character set in MARC-8 should be within the scope of ISO
> 2022.  One way would be to register a new graphic set for the
> purpose of escaping to and from UTF-8 or UTF-16.  Another would be
> to use the more generic SHIFT-IN and SHIFT-OUT control characters to
> signal when the encoding should change to and from UTF-8 or UTF-16.

Surely no one wants to do this?  This requires software development in
MARC-8 AND in Unicode systems.  I think we need to grandfather ISO 2022
along with MARC-8.

<snip>

Johan Zeeman
RLG

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2018
February 2016
September 2013
March 2013
September 2008
December 2007
October 2007
September 2007
August 2007
July 2007
June 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager