N18R: Potential future candidates for new codes (larger languages)
Dear Rebecca
Thank you for convening and chairing the extremely useful meeting of
the ISO 639 JAC.
Since the meeting I have rechecked my submission ISO 639 JAC N18, and
realised that N18 repeated many language names already in one or both
parts of ISO 639. I have prepared a more concise version (N18R) which
avoids this repetition.
I would be grateful if you could therefore distribute to the JAC list
and the JAC website N18R instead: distribution of revised (R)
documents is fairly normal in ISO as long as the relationship to the
"non-R" version is clear (its history is noted below).
Best regards
John Clews
------------------------------------------------------------
ISO 639 JAC N18R
Title: Potential future candidates for new codes (larger languages)
Date: 2000-02-07 (note added to cover page: 2000-02-21)
Source: John Clews (UK)
Status: Personal contribution
Action: For the consideration of the ISO 639 JAC
Distribution: ISO 639 JAC
Note: Following the Washington DC meeting (2000-02-17/18)
of the ISO 639 JAC, please note that the Foundation for
Endangered Languages plans to submit an application for new
codes for some of the languages below to be added to ISO
639-2, based on the following list.
This list is submitted to members of the ISO 639 JAC for
advance information, prior to that submission. N18R is a more
concise and more accurate version of ISO 639 JAC N18, which
was distributed on paper at the ISO 639 JAC meeting
(2000-02-17/18). In error, N18 duplicated information in ISO
639 JAC N11 (ISO DIS 639-1:1999) - these are now deleted in
N18R, and N18R supersedes N18.
------------------------------------------------------------
Potential future candidates for new codes (larger languages)
------------------------------------------------------------
There is a case for some languages listed below to be added to
ISO 639-2, as long as the languages concerned meet the relevant
review criteria. There are also reasons not to add codes for all
language entities below, but the list below is intended to aid the
review process.
In most entries, only languages NOT listed in ISO 639 are listed.
Entries in this table are based on a comparison of ISO 639 with SIL
codes (SIL codes are 3-letter codes using CAPITAL letters below).
As some 3-letter SIL codes are incompatible with the 3-letter codes
in ISO 639-2, SIL have also considered looking into the feasibility
of alligning some of the 3-letter SIL codes with the 3-letter codes
in ISO 639-2.
Entries common to both SIL and ISO 639 have been removed from this
list, except where some clarification may be of use.
This list runs broadly from East through West, from China through
Europe. The addition of further major languages of the Americas is
not proposed here, as ISO 639 covers major languages of the Americas
fairly well already.
------------------------------------------------------------
East Asia
------------------------------------------------------------
1,487,000 China KHAMS KHG
1,480,750 China DONG, SOUTHERN KMC
>>>> ISO 639: no codes for Khams and Dong (non-Han languages)
>>>> NB: it will be useful to consult the official lists of
around 55 national minorities, to check which, if any,
non-Han languages with official status are omitted from
ISO 639.
>>>> What scripts are used for KHAMS and DONG? Latin script?
------------------------------------------------------------
Southeast Asia and Oceania
------------------------------------------------------------
1,190,000 Viet Nam TAY THO
ISO 639-2 provides only for Tai (other); not for Tay (Tai Tho)
2,083,000 Myanmar ARAKANESE MHV
ISO 639-2 provides for Karen and Shan; nothing for Arakanese
------------------------------------------------------------
3,000,000 Indonesia BANJAR BJN
Also known as
BANJAR MALAY
2,700,000 Indonesia BETAWI BEW
Also known as
JAKARTA MALAY
>>>> After checking with Southeast Asian librarians at the
British Library, it is apparent that these are significantly
different from Malay. It is not clear whether there is a
similar situation with Malay languages and Sami languages.
There may be a case for providing a code for "Malay languages
(other)" as well as particular Malay languages.
2,000,000 Indonesia BATAK TOBA BBC
1,200,000 Indonesia BATAK DAIRI BTD
>>>> Note: there are various languages called BATAK in Sumatra,
Indonesia (BATAK ALAS-KLUET, BATAK ANGKOLA, BATAK DAIRI -
(which has 1,200,000) speakers, BATAK KARO, BATAK MANDAILING,
BATAK SIMALUNGUN and BATAK TOBA (which has 2,000,000
speakers)).
>>>> NB: note also also the different language BATAK in the
Phillipines, with the SIL code BTK, which is assumed to be
the "Batak" language encoded in ISO 639-2.
1,500,000 Indonesia LAMPUNG LJP
1,000,000 Indonesia REJANG REJ
ISO 639: No codes for LAMPUNG or REJANG. These too are spoken in
Sumatra, Indonesia.
>>>> Dialects assumed? Or different languages?
1,000,000 Phillipines MADINDANAON MDH
>>>> In passing, ISO 639 provides for most other large languages
of the Philippines.
50,000 Papua New Guinea TOK PISIN PDG
>>>> ISO 639: prefer to add special code for Tok Pisin? This has
national status in Papua New Guinea. Currently only "cpe"
(Creoles & Pidgins, English) is available. However, Bislama
(which can also be described as an English-based creole
language) does have a separate code.
>>>> In passing, ISO 639-2 provides codes for most other larger
languages of Oceania.
------------------------------------------------------------
South Asia
------------------------------------------------------------
India
ISO 639 does not list several of the following, with names as such:
13,000,000 India HARYANVI BGC
6,000,000 India KANAUJI BJJ
3,500,000 India PARSI PRP
2,730,120 India LAMBADI LMN
2,246,105 India KHANDESI KHN
2,095,280 India DOGRI-KANGRI DOJ
2,081,756 India GARHWALI GBM
2,013,000 India KUMAUNI KFY
1,921,000 India BAGRI BGQ
1,861,965 India SADRI SCK
1,856,000 India TULU TCY
1,600,000 India BHILI BHB
1,544,000 India WAGDI WBR
1,473,000 India MUNDARI MUW
1,295,000 India NIMADI NOE
1,050,000 India MALVI MUP
1,026,000 India HO HOC
3,000 India BROKSKAT BKK
(Broksat is an Indo-Aryan (Dardic) language)
>>>> Some dialects assumed in above list?
------------------------------------------------------------
For Indian languages, Peter Claus (California State University,
Hayward) also suggests
- Kodagu (Coorgi) which has a relatively small (but established)
literature with a number of scholars working on it.
- Badaga, which has oral texts transliterated by scholars, and
- Toda, Kota, and Kuruba languages, along the border of Karnataka
and Tamil Nadu.
------------------------------------------------------------
5,100,000 Bangladesh SYLHETTI SYL
>>>> Widely used in the United Kingdom Bangladeshi community.
Sylheti Nagri script was used in the past in Bengal.
------------------------------------------------------------
15,015,000 Pakistan SARAIKI (Siraiki) SKR
2,210,000 Pakistan BRAHUI BRH
1,875,000 Pakistan HINDKO, NORTHERN HNO
625,000 Pakistan HINDKO, SOUTHERN HIN
>>>> Some dialects assumed?
------------------------------------------------------------
Dr. Elena Bashir, University of Michigan, also suggests the following
languages which are in SIL:
333,640 Pakistan BALTI BFT
(Tibeto-Burman)
320,000 Pakistan SHINA SCL
222,800 Pakistan KHOWAR KHW
220,000 Pakistan KOHISTANI, INDUS MVY
200,000 Pakistan SHINA, KOHISTANI PLK
108,000 Afghanistan PASHAYI, SOUTHWEST PSH
- Afghanistan PASHAYI, NORTHEAST AEE
- Afghanistan PASHAYI, NORTHWEST GLH
- Afghanistan PASHAYI, SOUTHEAST DRA
60,000 Pakistan TORWALI TRW
5,000 Pakistan DAMELI DML
2,900 Pakistan KALASHA KLS
(Indo-Aryan (Dardic))
29,000 Pakistan WAKHI WBL
5,000 Pakistan YIDGHA YDG
500 Pakistan DOMAAKI DMK
(Indo-Aryan)
55,000 Pakistan BURUSHASKI BSK
(Isolate)
9,500 Afghanistan GAWAR-BATI GWT
5,000 Afghanistan GRANGALI NLI
(Indo-Aryan)
2,000 Afghanistan WOTAPURI-KATARQALAI WSV
1,000 Afghanistan SHUMASHTI SMS
- Afghanistan TIRAHI TRA
(Indo-Aryan (Dardic))
4,000 Tajikistan YAZGULYAM YAH
(Iranian)
4,280,000 Iran LURI LRI
3,265,000 Iran MAZANDERANI MZN
3,265,000 Iran GILAKI GLK
1,500,000 Iran QASHQAI QSQ
Dr. Elena Bashir, University of Michigan, also suggests the following
languages which are apparently not in SIL:
Gojri Indo-Aryan
Kanyawali Indo-Aryan (Dardic)
Palula Indo-Aryan (Dardic)
Sawi Indo-Aryan (Dardic)
Ishkashmi Iranian
Zebaki Iranian
------------------------------------------------------------
Northern Africa (including the Horn of Africa)
------------------------------------------------------------
3,500,000 Morocco TAMAZIGHT, CENTRAL ATLAS TZM
ISO 639 codes Tamashek; check differences from Tamazight and other
languages with similar names (see below and Ethnologue entries)
3,500,000 Morocco TACHELHIT SHI
2,000,000 Morocco TARIFIT RIF
2,511,000 Mauritania HASSANIYYA MEY
1,400,000 Algeria CHAOUIA SHY
1,148,000 Sudan BEDAWI BEI
1,236,637 Ethiopia GAMO-GOFA-DAWRO GMO
1,231,673 Ethiopia WOLAYTTA WBC
------------------------------------------------------------
West Africa (including North-West Africa)
------------------------------------------------------------
600,000 Mali DOGON DOG
500,000 Mali SENOUFO, MAMARA MYK
361,700 Mali BOMU BMQ
100,000 Mali BOSO, SOROGAMA BZE
270,000 Mali TAMASHEQ, KIDAL TAQ
ISO 639 codes Tamashek; check differences from Tamazight (see above)
+ 1,168,500 Mali FULFULDE, MAASINA FUL
+ 7,611,000 Nigeria FULFULDE, NIGERIAN FUV
+ 450,000 Niger FULFULDE,
CENTRAL-EAST NIGER FUQ
>>>> ISO 639 codes are "ful" & "ff" - Fulah (Fulfulde/Fulani assumed)
>>>> Relationship of Fulfulde languages etc. needs clarification.
640,000 Niger TAMAJAQ, TAWALLAMMAT TTQ
>>>> ISO 639 codes Tamashek; check differences from Tamajaq (see above)
2,151,000 Niger ZARMA DJE
2,520,000 Burkina Faso JULA DYU
1,500,000 Nigeria IBIBIO IBB
1,000,000 Nigeria EDO EDO
1,000,000 Nigeria EBIRA IGB
1,000,000 Nigeria ANAANG ANW
2,921,300 Senegal PULAAR FUC
313,000 Senegal JOLA-FOGNY DYO
2,900,000 Guinea FUUTA JALON FUF
2,130,000 Cote d'Ivoire BAOULE BCI
1,020,000 Cote d'Ivoire DAN DAF
------------------------------------------------------------
Eastern and Central Africa
------------------------------------------------------------
2,458,000 Kenya KALENJIN KLN
1,582,000 Kenya GUSII GUZ
1,305,000 Kenya MERU MER
1,300,000 Tanzania GOGO GOG
1,260,000 Tanzania MAKONDE KDE
1,200,000 Tanzania HAYA HAY
1,050,000 Tanzania NYAKYUSA-NGONDE NYY
1,391,442 Uganda CHIGA CHG
1,370,845 Uganda SOGA SOG
1,217,000 Uganda TESO TEO
------------------------------------------------------------
Central and Southern Africa
------------------------------------------------------------
4,200,000 Congo Dem Rep KITUBA KTU
1,156,800 Congo MUNUKUTUBA MKW
1,004,000 Congo Dem Rep CHOKWE CJK
1,000,000 Congo Dem Rep SONGE SOP
>>>> In passing, no relationship to Tsonga, already in ISO 639
2,850,000 Mozambique LOMWE NGL
2,500,000 Mozambique MAKHUWA VMW
1,160,000 Mozambique MAKHUWA-MEETTO MAK
1,100,000 Mozambique SENA SEH
John Clews
7 February 2000 (updated/corrected 21 February 2000).
--
John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: +44 1423 888 432; fax: + 44 1423 889061;
Email: [log in to unmask]
Committee Chair of ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of CEN/TC304: Information and Communications
Technologies: European Localization Requirements
Committee Member of TS/1: Terminology (UK national member body of
ISO/TC37: Terminology)
Committee Member of the Foundation for Endangered Languages;
Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets
|