I'm new to the list and the gopher search of past postings wasn't working,
so please bear with me if this question has come up before.
I've recently been incorporating the USMARC suggested usm-94 to Unicode (and
vice-versa) mappings into our software. The USMARC documentation refers
readers to charts.unicode.org for the CJK Unicode/EACC equivalences. I have
absolutely no experience in the CJK arena, and have the following question:
Of the thousands of CJK characters that actually have EACC equivalents at
charts.unicode.org, roughly 2/3 or more (I don't have the actual numbers in
front of me right now--but this figure is in the ballbark) of the codes are
not unique; i.e., there are many Unicode CJK characters that map to the same
EACC character. Can anyone briefly explain why this is and whether there's
an algorithm for choosing the correct mapping? Perhaps the answer will
become obvious reading the NISO CJK standard, which we currently don't
have--we need to get that one of these days. Or perhaps my parsing of the
Unihan database a few days ago was incorrect--leading to the duplicate
mappings--but I don't think so.
Thanks in advance for any information.
Mark Reichert
|