At 06:23 PM 11/4/2002 +0100, Yves Pratter wrote:

> >Allowing researchers to use Pinyin at the keyboard rather than
> >forcing them through an alternate keyboard is still considered a "service"
> >by some.
>if you want to provide such facility, ok you could provide a optionnal field
>that say that original datas where written in Pinyin (Pinyin is a language,
>not a charset ?).

That's exactly what I want, an optional field. But Pinyin is neither a
language nor a character set, it's a transliteration standard. Here's a
Pinyin field for a Chinese book title:
    Ho Ching-ming ts?ung k?ao /  Pai Jun-te chu
In the vernacular, instead of those latin characters you would see chinese
characters. The Chinese has been rendered more or less phonetically to put
it into the latin character set. That's not a two-way street, however.

>But all MODS data should be in unicode.

It is. Pinyin, or other transliterations, are written in latin characters,
which are represented in Unicode. The entire MODS record can be in Unicode
and use only letters A-Z,a-z. Let's not confuse "Unicode" with "scripts".

> >The fact is that people using MARC *do* have both kinds of fields in their
> >records, so by not including them we run the risk of making MODS less
> >useful and therefore less used.
>Currently, MARC softwares doesn't support yet MODS.
>So when softwares engineers will provide import/export modules for MODS,
>they will provide automatic (if possible) transliteration from/to unicode.

With Western European languages it is (often) possible to translate from a
character set like ISO 8859-1 to the Unicode equivalent. But it is not
possible to translate from a *transliteration* of Chinese or Russian to the
vernacular characters of the original language. So my concern is not with
languages that use a latin-based script but with ones that do not.
Karen Coyle           [log in to unmask]