At 12:24 PM 9/22/2005, Geoff Mottram <[log in to unmask]> wrote:
>In case anyone is interested, while generating character maps for
>converting between MARC-8 and Unicode, the following exceptions were
>noted. These are cases where two MARC-8 characters are mapped to the same
>Unicode character, meaning that information may be lost in the process.
>Some of these characters are documented in the LC code table as "duplicate
>simplified" and others as "variants". However, there are still many
>characters that are not documented as either. I'm sure this is old news
>but, if not, it may be of interest to someone.
This is old news. I mentioned one case (the "variants") in my posting
earlier today (follow-up re the "geta").
Yes, "information" is lost in the process of mapping two EACC characters to
one Unicode character, but it is information that the East Asian experts on
ISO's Ideographic Rapporteur Group considered to be typeface aspects. This
mapping process was also acceptable to LC's East Asian experts. (Anyone
wanting more details about the ideographic content of Unicode and ISO/IEC
10646 should read Chapter 11 of The Unicode Standard
http://www.unicode.org/versions/Unicode4.0.0/ch11.pdf )
Both OCLC and RLG worked intensively with LC on the final round of EACC
mapping modification, to meet LC's desire to reduce the number of
characters mapped to Private Use Area (PUA) code points. All changes are
extensively documented in the Revision History at the top of the EACC code
table. I believe that OCLC and RLG independently checked the complete final
mapping for EACC.
-- Joan
p.s. When there's a long list of data, it is better to post it somewhere
and give its URL. Just a few examples in a posting are sufficient to
communicate the problem to most list subscribers. Anyone who wants to see
the whole list can get it via the URL.
|