ISO 639/JAC N3R
ISO 639 Joint Advisory Committee
Working principles for ISO 639 maintenance
(8 March 2000)
The following documents working principles for the maintenance of language codes by the ISO 639 Joint Advisory Committee both in ISO 639-1 (Alpha-2 code) and ISO 639-2 (Alpha-3 code). It repeats some information that is in ISO 639-2:1998 in section 4 (Language codes) and the normative Annex A. In addition, it gives further details as to how language code changes that are submitted are considered and how the two parts of ISO 639 are related.
1. Definition of new language codes
- A Registration form is available on the Web for requesting new language codes, which is submitted to the appropriate ISO 639 Registration Authority for consideration.
- The Registration Authority will review applications, obtain additional information and/or justification from the submitter, and suggest the assignment of a code when the relevant criteria are met.
1.2. Criteria for ISO 639-2
- Number of documents. The request for a new language code shall include evidence that one agency holds 50 different documents in the language or that five agencies hold a total of 50 different documents among them in the language. Documents include all forms of material and is not limited to text.
- Collective codes. If the criteria above are not met the language may be assigned a new or existing collective language code. The words languages or other as part of a language name indicates that a language code is a collective one.
- Scripts. A single language code is normally provided for a language even though the language is written in more than one script. A standard for script codes is under development by ISO/TC46/SC2, ISO DIS 15924: Codes for the representation of names of scripts.
- Dialects.A dialect of a language is usually represented by the same language code as that used for the language. If the language is assigned to a collective language code, the dialect is assigned to the same collective language code. The difference between dialects and languages will be decided on a case-by-case basis.
- Orthography. A language using more than one orthography is not given multiple language codes.
1.3. Criteria for ISO 639-1
- Relation to ISO 639-2. Since ISO 639-1 is to remain a subset of ISO 639-2, it must first satisfy the requirements for ISO 639-2 and also satisfy the following.
- a significant body of existing documents (specialized texts, such as college or university textbooks, technical documentation manuals, specialized journals, subject-field related books, etc.) written in specialized languages
- a number of existing terminologies in various subject fields (e.g. technical dictionaries, specialized glossaries, vocabularies, etc. in printed or electronic form)
- Recommendation.A recommendation and support of a specialized authority (such as a standards organization, governmental body, linguistic institution, or cultural organization)
- Other considerations
- the number of speakers of the language community
- the recognized status of the language in one or more countries
- the support of the request by one or more official bodies
- Collective codes. ISO 639-1 does not use collective codes. If these are necessary the alpha-3 code shall be used.
2. Choice of new language codes
- Language codes consist of the following 26 letters of the Latin alphabet in lower case with no diacritical marks or modified characters: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z.
- ISO 639-2 uses three alphabetic characters, and ISO 639-1 uses two alphabetic characters.
- Codes need not be abbreviations for the language as they are intended to serve as an arbitrary device to identify a given language or group of languages. Mnemonicity of codes is striven for, but this may not always be possible or appropriate.
- An effort is made to derive a language code from a language's name for itself, when possible. For historical reasons, some codes may be based on the name of a language in English.
- There are 23 language names in ISO 639-2 that have variant codes, one for bibliographic applications, the other for terminological applications. This was because of established usage in national and international bibliographic databases which employed codes based on English language forms of names.
- New language codes shall be based on the vernacular form of name unless
In the latter two cases, an English form of name may be used for to derive the language code.
- another language code is requested by the country or countries using the language or the sponsor submitting the request;
- if the vernacular cannot be determined; or
- if a suitable code is not available
- A language code already in ISO 639-2/T which is based on the English form of the name shall not be changed even if the vernacular form is determined and/or added to ISO 639-1. This is to ensure continuity and stability and to prevent the proliferation of multiple or alternative codes.
- A prefix is not regarded as part of the language name for purposes of assigning a code (e.g. Swahili is language name, although "KiSwahili" is often used).
3. Changes of existing language codes
- To ensure continuity and stability in support of online retrieval from large databases built over many years, codes shall not be changed.
- Where codes have been changed or discontinued in the past, the old codes shall not be reassigned.
- Language codes shall not be changed if the conventional name of a language is changed. However, language names associated with codes may be changed.
- Variant forms of a language name may be included in the entry, separated by a semicolon in the future. No effort will be made by the Registration Authorities to collect those variants that were previously not included.
- The MARC Code List for Languages maintains variant names of languages and may be used as a reference source.
4. Relationship between ISO 639-1 to ISO 639-2
- In development of ISO 639-2 there was a principle that a code in the alpha-3 list would include the 2 characters from the alpha-2 where possible. An exception was the alternative codes, where longstanding and widespread existing usage of bibliographic codes did not permit this.
- New codes introduced in ISO 639-1 that are already included in ISO 639-2 should follow this principle. If the vernacular form had not been used in ISO 639-2/T, the ISO JAC will attempt establish an alpha-2 code with two letters in common with the alpha-3 code when possible.
- ISO 639-1 shall be a subset of ISO 639-2.
- New codes will no longer be added to ISO 639-1 after the publication of a revised standard.
- A language code already in ISO 639-2 at the point of freezing ISO 639-1 shall not later be added to ISO 639-1. This is to ensure consistency in usage over time, since users are directed in Internet applications to employ the alpha-3 code when an alpha-2 code for that language is not available.
- New language codes may be considered for inclusion in both parts or in ISO 639-2 only. If request is to add to ISO 639-1 it must also be added to ISO 639-2 and satisfy the stated criteria.
See also Rules of procedure for conducting business (ISO 639/JAC N2R).
ISO 639 Joint Advisory Committee Home Page -
ISO 639-2 Registration Authority Home Page -
ISO 639-1 Registration Authority Home Page (not yet active)
Library of Congress
Comments: [log in to unmask] (2/10/00)