This is a clarification of some remarks by Jim Agenbroad posted on
Mon, 7 Apr 2003.
Jim wrote:
>It would seem from this that Unicode also has some uncertainty about
>double-width diacritics when used in combination with other
>diacritics (combining characters).
I am not sure what "this" refers to. However, there is no
uncertainty. The interaction of the double diacritics with other
nonspacing marks (a.k.a diacritics) is clearly specified on page 179
of "The Unicode Standard, Version 3.0."
<SNIP: comments on use of the double tilde and double breve>
>A broader, but
>related question would be will MARC 21 change to support separate
>Unicode values for letter+diacritic combinations when they exist in
>Unicode. Not all combinations found in MARC data have separate
>codes but the more common ones for European languages are there.
>Allowing them would mean a MARC 21 record could contain a mix of two
>techniques which isn't very elegant IMHO but I think some would
>still favor the change as it would simplify display if not indexing.
It is wrong to imply that accented letters must be encoded as
precomposed (composite) characters if they are to be displayed
correctly. Modern rendering systems such as OpenType and AAT
position accents correctly on base letters even if the source data
is decomposed.
Systems need to be able to accept both precomposed and decomposed
forms, converting if needed to the system's internal form. Although
there is currently an agreement to use decomposed forms in MARC 21
records encoded in UTF-8, there is no guarantee that all incoming
records will adhere to this.
The problem of displaying the combining half marks comes about
precisely because of the sophistication of modern rendering
software. Such software is built to handle WHOLE accents and in a
proportional font environment, not HALVES OF WHOLE ACCENTS (used in
a monospacing environment and created because of printer
inadequacies long ago).
-- Joan Aliprand
Senior Analyst, RLG
To: [log in to unmask]
|