LISTSERV mailing list manager LISTSERV 16.0

Help for ISOJAC Archives

ISOJAC Archives

ISOJAC Archives













By Topic:










By Author:











Proportional Font





ISOJAC  February 2000

ISOJAC February 2000


N18R: Potential future candidates for new codes (larger languages)


John Clews <[log in to unmask]>


[log in to unmask]


Mon, 21 Feb 2000 22:31:02 GMT





text/plain (391 lines)

N18R: Potential future candidates for new codes (larger languages)

Dear Rebecca

Thank you for convening and chairing the extremely useful meeting of
the ISO 639 JAC.

Since the meeting I have rechecked my submission ISO 639 JAC N18, and
realised that N18 repeated many language names already in one or both
parts of ISO 639. I have prepared a more concise version (N18R) which
avoids this repetition.

I would be grateful if you could therefore distribute to the JAC list
and the JAC website N18R instead: distribution of revised (R)
documents is fairly normal in ISO as long as the relationship to the
"non-R" version is clear (its history is noted below).

Best regards

John Clews


ISO 639 JAC N18R

Title:  Potential future candidates for new codes (larger languages)
Date:   2000-02-07 (note added to cover page: 2000-02-21)
Source: John Clews (UK)
Status: Personal contribution
Action: For the consideration of the ISO 639 JAC
Distribution: ISO 639 JAC

Note:   Following the Washington DC meeting (2000-02-17/18)
        of the ISO 639 JAC, please note that the Foundation for
        Endangered Languages plans to submit an application for new
        codes for some of the languages below to be added to ISO
        639-2, based on the following list.

        This list is submitted to members of the ISO 639 JAC for
        advance information, prior to that submission. N18R is a more
        concise and more accurate version of ISO 639 JAC N18, which
        was distributed on paper at the ISO 639 JAC meeting
        (2000-02-17/18). In error, N18 duplicated information in ISO
        639 JAC N11 (ISO DIS 639-1:1999) - these are now deleted in
        N18R, and N18R supersedes N18.

Potential future candidates for new codes (larger languages)

There is a case for some languages listed below to be added to
ISO 639-2, as long as the languages concerned meet the relevant
review criteria. There are also reasons not to add codes for all
language entities below, but the list below is intended to aid the
review process.

In most entries, only languages NOT listed in ISO 639 are listed.

Entries in this table are based on a comparison of ISO 639 with SIL
codes (SIL codes are 3-letter codes using CAPITAL letters below).
As some 3-letter SIL codes are incompatible with the 3-letter codes
in ISO 639-2, SIL have also considered looking into the feasibility
of alligning some of the 3-letter SIL codes with the 3-letter codes
in ISO 639-2.

Entries common to both SIL and ISO 639 have been removed from this
list, except where some clarification may be of use.

This list runs broadly from East through West, from China through
Europe. The addition of further major languages of the Americas is
not proposed here, as ISO 639 covers major languages of the Americas
fairly well already.

East Asia

       1,487,000      China           KHAMS                  KHG
       1,480,750      China           DONG, SOUTHERN         KMC

>>>>    ISO 639: no codes for Khams and Dong (non-Han languages)

>>>>    NB: it will be useful to consult the official lists of
        around 55 national minorities, to check which, if any,
        non-Han languages with official status are omitted from
        ISO 639.

>>>>    What scripts are used for KHAMS and DONG? Latin script?

Southeast Asia and Oceania

       1,190,000      Viet Nam        TAY               THO

ISO 639-2 provides only for Tai (other); not for Tay (Tai Tho)

       2,083,000      Myanmar         ARAKANESE         MHV

ISO 639-2 provides for Karen and Shan; nothing for Arakanese


       3,000,000      Indonesia       BANJAR                BJN
                                      Also known as
                                      BANJAR MALAY

       2,700,000      Indonesia       BETAWI                BEW
                                      Also known as
                                      JAKARTA MALAY

>>>>    After checking with Southeast Asian librarians at the
        British Library, it is apparent that these are significantly
        different from Malay. It is not clear whether there is a
        similar situation with Malay languages and Sami languages.
        There may be a case for providing a code for "Malay languages
        (other)" as well as particular Malay languages.

       2,000,000      Indonesia       BATAK TOBA      BBC
       1,200,000      Indonesia       BATAK DAIRI     BTD

>>>>    Note: there are various languages called BATAK in Sumatra,
        (which has 1,200,000) speakers, BATAK KARO, BATAK MANDAILING,
        BATAK SIMALUNGUN and BATAK TOBA (which has 2,000,000

>>>>    NB: note also also the different language BATAK in the
        Phillipines, with the SIL code BTK, which is assumed to be
        the "Batak" language encoded in ISO 639-2.

       1,500,000      Indonesia       LAMPUNG               LJP
       1,000,000      Indonesia       REJANG                REJ

ISO 639: No codes for LAMPUNG or REJANG. These too are spoken in
Sumatra, Indonesia.

>>>>    Dialects assumed? Or different languages?

       1,000,000      Phillipines             MADINDANAON     MDH

>>>>    In passing, ISO 639 provides for most other large languages
        of the Philippines.

          50,000      Papua New Guinea        TOK PISIN       PDG

>>>>    ISO 639: prefer to add special code for Tok Pisin? This has
        national status in Papua New Guinea. Currently only "cpe"
        (Creoles & Pidgins, English) is available. However, Bislama
        (which can also be described as an English-based creole
        language) does have a separate code.

>>>>    In passing, ISO 639-2 provides codes for most other larger
        languages of Oceania.

South Asia


ISO 639 does not list several of the following, with names as such:

      13,000,000      India           HARYANVI        BGC
       6,000,000      India           KANAUJI         BJJ
       3,500,000      India           PARSI           PRP
       2,730,120      India           LAMBADI         LMN
       2,246,105      India           KHANDESI        KHN
       2,095,280      India           DOGRI-KANGRI    DOJ
       2,081,756      India           GARHWALI        GBM
       2,013,000      India           KUMAUNI         KFY
       1,921,000      India           BAGRI           BGQ
       1,861,965      India           SADRI           SCK
       1,856,000      India           TULU            TCY
       1,600,000      India           BHILI           BHB
       1,544,000      India           WAGDI           WBR
       1,473,000      India           MUNDARI         MUW
       1,295,000      India           NIMADI          NOE
       1,050,000      India           MALVI           MUP
       1,026,000      India           HO              HOC
           3,000      India           BROKSKAT        BKK
                      (Broksat is an Indo-Aryan (Dardic) language)

>>>>    Some dialects assumed in above list?

For Indian languages, Peter Claus (California State University,
Hayward) also suggests

 - Kodagu (Coorgi) which has a relatively small (but established)
   literature with a number of scholars working on it.

 - Badaga, which has oral texts transliterated by scholars, and

 - Toda, Kota, and Kuruba languages, along the border of Karnataka
   and Tamil Nadu.

       5,100,000      Bangladesh      SYLHETTI        SYL

>>>>    Widely used in the United Kingdom Bangladeshi community.
        Sylheti Nagri script was used in the past in Bengal.

      15,015,000      Pakistan        SARAIKI (Siraiki)      SKR
       2,210,000      Pakistan        BRAHUI                 BRH
       1,875,000      Pakistan        HINDKO, NORTHERN       HNO
         625,000      Pakistan        HINDKO, SOUTHERN       HIN

>>>>    Some dialects assumed?

Dr. Elena Bashir, University of Michigan, also suggests the following
languages which are in SIL:

         333,640     Pakistan        BALTI                   BFT

         320,000     Pakistan        SHINA                   SCL
         222,800     Pakistan        KHOWAR                  KHW
         220,000     Pakistan        KOHISTANI, INDUS        MVY
         200,000     Pakistan        SHINA, KOHISTANI        PLK
         108,000     Afghanistan     PASHAYI, SOUTHWEST      PSH
                -    Afghanistan     PASHAYI, NORTHEAST      AEE
                -    Afghanistan     PASHAYI, NORTHWEST      GLH
                -    Afghanistan     PASHAYI, SOUTHEAST      DRA
           60,000    Pakistan        TORWALI                 TRW
            5,000    Pakistan        DAMELI                  DML
            2,900    Pakistan        KALASHA                 KLS
                                     (Indo-Aryan (Dardic))

           29,000    Pakistan        WAKHI                   WBL
            5,000    Pakistan        YIDGHA                  YDG

              500    Pakistan        DOMAAKI                 DMK

           55,000    Pakistan        BURUSHASKI              BSK

            9,500    Afghanistan     GAWAR-BATI              GWT
            5,000    Afghanistan     GRANGALI                NLI

            2,000    Afghanistan     WOTAPURI-KATARQALAI     WSV
            1,000    Afghanistan     SHUMASHTI               SMS
                -    Afghanistan     TIRAHI                  TRA
                                     (Indo-Aryan (Dardic))

            4,000    Tajikistan      YAZGULYAM               YAH

        4,280,000    Iran            LURI                    LRI
        3,265,000    Iran            MAZANDERANI             MZN
        3,265,000    Iran            GILAKI                  GLK
        1,500,000    Iran            QASHQAI                 QSQ

Dr. Elena Bashir, University of Michigan, also suggests the following
languages which are apparently not in SIL:

                                Gojri        Indo-Aryan
                                Kanyawali    Indo-Aryan (Dardic)
                                Palula       Indo-Aryan (Dardic)
                                Sawi         Indo-Aryan (Dardic)

                                Ishkashmi    Iranian
                                Zebaki       Iranian

Northern Africa (including the Horn of Africa)

       3,500,000      Morocco    TAMAZIGHT, CENTRAL ATLAS   TZM

ISO 639 codes Tamashek; check differences from Tamazight and other
languages with similar names (see below and Ethnologue entries)

       3,500,000      Morocco    TACHELHIT       SHI
       2,000,000      Morocco    TARIFIT         RIF
       2,511,000      Mauritania HASSANIYYA      MEY

       1,400,000      Algeria    CHAOUIA         SHY
       1,148,000      Sudan      BEDAWI          BEI

       1,236,637      Ethiopia   GAMO-GOFA-DAWRO GMO
       1,231,673      Ethiopia   WOLAYTTA        WBC

West Africa (including North-West Africa)

         600,000      Mali            DOGON                  DOG
         500,000      Mali            SENOUFO, MAMARA        MYK
         361,700      Mali            BOMU                   BMQ
         100,000      Mali            BOSO, SOROGAMA         BZE

         270,000      Mali            TAMASHEQ, KIDAL        TAQ

ISO 639 codes Tamashek; check differences from Tamazight (see above)

+      1,168,500      Mali            FULFULDE, MAASINA      FUL
+      7,611,000      Nigeria         FULFULDE, NIGERIAN     FUV
+        450,000      Niger           FULFULDE,
                                         CENTRAL-EAST NIGER  FUQ

>>>>    ISO 639 codes are "ful" & "ff" - Fulah (Fulfulde/Fulani assumed)
>>>>    Relationship of Fulfulde languages etc. needs clarification.

         640,000      Niger           TAMAJAQ, TAWALLAMMAT   TTQ

>>>>    ISO 639 codes Tamashek; check differences from Tamajaq (see above)

       2,151,000      Niger           ZARMA   DJE

       2,520,000      Burkina Faso    JULA    DYU

       1,500,000      Nigeria         IBIBIO  IBB
       1,000,000      Nigeria         EDO     EDO
       1,000,000      Nigeria         EBIRA   IGB
       1,000,000      Nigeria         ANAANG  ANW

       2,921,300      Senegal         PULAAR           FUC
         313,000      Senegal         JOLA-FOGNY       DYO

       2,900,000      Guinea          FUUTA JALON      FUF

       2,130,000      Cote d'Ivoire   BAOULE           BCI
       1,020,000      Cote d'Ivoire   DAN              DAF

Eastern and Central Africa

       2,458,000      Kenya           KALENJIN         KLN
       1,582,000      Kenya           GUSII            GUZ
       1,305,000      Kenya           MERU             MER

       1,300,000      Tanzania        GOGO             GOG
       1,260,000      Tanzania        MAKONDE          KDE
       1,200,000      Tanzania        HAYA             HAY
       1,050,000      Tanzania        NYAKYUSA-NGONDE  NYY

       1,391,442      Uganda          CHIGA            CHG
       1,370,845      Uganda          SOGA             SOG
       1,217,000      Uganda          TESO             TEO

Central and Southern Africa

       4,200,000      Congo Dem Rep   KITUBA           KTU
       1,156,800      Congo           MUNUKUTUBA       MKW

       1,004,000      Congo Dem Rep   CHOKWE           CJK
       1,000,000      Congo Dem Rep   SONGE            SOP

>>>>    In passing, no relationship to Tsonga, already in ISO 639

       2,850,000      Mozambique      LOMWE             NGL
       2,500,000      Mozambique      MAKHUWA           VMW
       1,160,000      Mozambique      MAKHUWA-MEETTO    MAK
       1,100,000      Mozambique      SENA              SEH

John Clews

7 February 2000 (updated/corrected 21 February 2000).

John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: +44 1423 888 432; fax: + 44 1423 889061;
Email: [log in to unmask]

Committee Chair of  ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of CEN/TC304: Information and Communications
 Technologies: European Localization Requirements
Committee Member of TS/1: Terminology (UK national member body of
 ISO/TC37: Terminology)
Committee Member of the Foundation for Endangered Languages;
Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets

Top of Message | Previous Page | Permalink

Advanced Options


Log In

Log In

Get Password

Get Password

Search Archives

Search Archives

Subscribe or Unsubscribe

Subscribe or Unsubscribe


April 2021
January 2021
November 2020
June 2020
May 2019
February 2019
September 2018
April 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
May 2016
April 2016
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
October 2013
September 2013
August 2013
July 2013
May 2013
April 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000



CataList Email List Search Powered by the LISTSERV Email List Manager