In
reply to Foster and Karen :
I read some articles in french and english to understand the problem of
transliteration (romanization) of languages like chinese, arabic,
hebrew...
Transliteration is
a systematic way to represent the sounds of words in one language using the
writing system of another language.
Transliteration is not
a translation :
北京 is romanized bei3 jing1 or Beijing in pinyin, but the translation in french is
Pékin.
As Foster noticed,
transliteration is very usefull to "help people learn the language's pronunciation" . In china, a lot of people don't read writes
vernacular chinese because it's too difficult, but they could learn easily
pinyin.
>>But all
MODS data should be in unicode.
>It
is. Pinyin, or other transliterations, are written in latin characters,
>which are represented in Unicode. The entire
MODS record can be in Unicode
>and
use only letters A-Z,a-z. Let's not confuse "Unicode" with
"scripts".
I
understood that the problem is a bit more complex :
The
chineese word 拼音 is
romanized Pinyin. Ok, it use "only" roman characters, so no problem with
unicode.
But in fact the tone marks are missing.
The exact romanization should be PĪN YĪN with
unicode chars that display the macron ("bar"), that represent the high level
tone.
With "only" ascii characters, the tones are represented by
a number, so here the romanized version is Pin1
Yin1.
So the
problem is how to specify the transliteration used ?
Pinyin
(unicode with tones, ascii with numbers, ascii without tones), bopomofo,
wade-giles ...
Should
we use always unicode version in MODS ?
<title lang="en">Good
Morning, New York</title>
<title lang="zh"
type="alternative">早上好,纽约</title>
<title lang="zh"
type="transliteration">zhao shang hao, New
York</title>
The
proposal of Fost to use attributes for a MODS elements is a good
way.
But
with the knowledge of subtilities of transliteration, i think that the attribute
should be like this :
<title lang="zh">北京</title>
<title lang="zh" transliteration="pinyin-ascii">bei3 jing1</title>
<title
lang="zh" transliteration="pinyin">beĭjīng</title>
or
<title lang="zh" transliteration="beĭjīng">北京</title>
or
<title
lang="zh">
北京
<transliteration
type="pinyin">beĭjīng</transliteration>
</title>
>Providing more options
allows users to make their own choices (in
this case making both vernacular and
transliteration data elements available, and either
or both can be used).
I understood that
transliteration is not a gadget, but it could be very usefull for authorities
(personal names, geographical names).
<name type="personal"
authority="lcsh" id="78087649" transliteration="pinyin">Mao
Zedong</name>
<name
type="personal" authority="lcsh" see="78087649"
transliteration="wade-giles">Mao
Tse-tung</name>
<name type="personal" authority="lcsh" see="78087649"
>毛澤東</name>
...
see
Yves
PS: in my "unicode"
examples, i put the tone on the vowel and may be it's not right.
I use the html format
with unicode characters, so i could display chineese
characters.
In order to see
correctly the tone signs, i use big font size.
So if your mail client use
only ascii chars, it will be more difficult to understand my email. In this
case, i could send you a
pdf