ISOJAC  March 2000

ISOJAC March 2000


N3R: removing ambiguities


John Clews


[log in to unmask]


Mon, 13 Mar 2000 01:49:08 GMT





text/plain (469 lines)

Dear Rebecca

Thanks for putting ISO 639/JAC N3R on the list. It is a significant
improvement, and section 3 in particular is very strong and clear.

However, some simplification of style, and removal of ambiguities is
still needed to avoid the JAC having problems later, and I embed my
comments below, either indenting my text by 8 spaces, or else using
>>>>    in the margin to flag my comments.

Unindented text is that of N3R.

In particular, there is a need to remove contradictions between
criteria (between ISO 639-1 and ISO 639-2), and to remove some things
which conflict with what the JAC decided (e.g. on diferent

I hope that you will find this useful in getting rid of ambiguities,
and ensuring that the path of the ISO 639 Joint Advisory Committee
runs smoothly, without unnecessary complications.


ISO 639 Joint Advisory Committee

Working principles for ISO 639 maintenance

(8 March 2000)

The following documents working principles for the maintenance of
language codes by the ISO 639 Joint Advisory Committee both in ISO
639-1 (Alpha-2 code) and ISO 639-2 (Alpha-3 code). It repeats some
information that is in ISO 639-2:1998 in section 4 (Language codes)
and the normative Annex A.

>>>>    It repeats some information that is in both ISO 639-2,
        _and_ in the equivalent parts of the section on Language
        codes, and the normative Annex A of ISO DIS 639-1.

        It may be worth pointing that out, to avoid any appearance
        that most things are dictated by ISO 639-2.

In addition, it gives further details as to how language code changes
that are submitted are considered and how the two parts of ISO 639
are related.


1. Definition of new language codes

1.1. Procedures

A Registration form is available on the Web for requesting new
language codes, which is submitted to the appropriate ISO 639
Registration Authority for consideration.•The Registration Authority
will review applications, obtain additional information and/or
justification from the submitter, and suggest the assignment of a
code when the relevant criteria are met.

>>>>    What is not explicit in these procedures is how the work is
        broken down between the two RAs, and the JAC. It may well be
        most cost effective to state explicitly that the the ISO 639
        Joint Advisory Committee will review all applications in
        consultation with the RAs.

1.2. Criteria for ISO 639-2

>>>>    Criteria should be merged into one, to avoid contradictions
        in applying these criteria (see below), now that it has been
        agreed that there will be less new codes in ISO 639-1 than in
        ISO 639-2, keeping just the one more stringent criterion for
        ISO 639-1 based on documentation.

Number of documents. The request for a new language code shall
include evidence that one agency holds 50 different documents in the
language or that five agencies hold a total of 50 different documents
among them in the language. Documents include all forms of material
and is not limited to text. •

>>>>    Given the fact that many codes in the standard do not
        themselves meet this criteria, it is invidious to impose this
        on the submitter.

        Please can you change the sentence

        "The request for a new language code shall include evidence
        that one agency holds 50 different documents in the language
        or that five agencies hold a total of 50 different documents
        among them in the language"

        to the sentence

        "The request for a new language code shall include evidence
        that there are a substantial number of documents in or about
        the language concerned."

>>>>    The following criteria from section 1.3 below are equally
        valid in section 1.3:

        Recommendation. A recommendation and support of a specialized
        authority (such as a standards organization, governmental
        body, linguistic institution, or cultural organization) •

        Other considerations •
         - the number of speakers of the language community•
         - the recognized status of the language in one or more countries•
         - the support of the request by one or more official bodies

        Otherwise there is an inherent contradiction between the
        criteria for the codes: more codes are permitted in ISO
        639-2, but some of the criteria for ISO 639-1 additions would
        allow more codes (e.g. those under "other considerations"
        above) which could allow more to be included under ISO 639-1
        criteria than ISO 639-2.

        If this is not resolved, there would be problems in meeting
        either set of criteria.

Collective codes. If the criteria above are not met the language may
be assigned a new or existing collective language code. The words
languages or other as part of a language name indicates that a
language code is a collective one. •

Scripts. A single language code is normally provided for a language
even though the language is written in more than one script.

>>>>    This contradicts what the JAC seems to have done - cf. the
        Bosnian example referred to below.

        Which languages in ISO 639 (either part) or proposed
        additions actually follow this now?

A standard for script codes is under development by ISO/TC46/SC2, ISO
DIS 15924: Codes for the representation of names of scripts. •

>>>>    For simpler grammar and style, simplify to:

        ISO DIS 15924 "Codes for the representation of names of
        scripts" is under development by ISO/TC46/SC2.

Dialects. A dialect of a language is usually represented by the same
language code as that used for the language. If the language is
assigned to a collective language code, the dialect is assigned to
the same collective language code. The difference between dialects
and languages will be decided on a case-by-case basis.•

Orthography. A language using more than one orthography is not given
multiple language codes.

>>>>    Too specific! Orthography was a major factor in determining
        the status of Bosnian. For simplicity, delete, or say the
        opposite, e.g.

        "Orthography. The use of more than one orthography
        may be significant in determining language status."

1.3. Criteria for ISO 639-1

>>>>    Criteria for either part, not for each part, would be
        preferable. There are some contraditions between the two.

Relation to ISO 639-2. Since ISO 639-1 is to remain a subset of ISO
639-2, it must first satisfy the requirements for ISO 639-2 and also
satisfy the following. •

>>>>    This is the first time that it is suggested that ISO 639-1
        has a subset relationship to ISO 639-2. There is no such
        premise previously mentioned, so "Since" cannot begin this
        sentence. The entities are also confused. For clarity prefer:

        "Relation to ISO 639-2. The language [entities] coded in ISO
        639-1 will remain a subset of those in ISO 639-2."

        or, more simply:

        "ISO 639-1 will not provide any codes for languages
        that are not provided in ISO 639-2."

        "[For terminological use] it should also satisfy the following

Documentation. •a significant body of existing documents (specialized
texts, such as college or university textbooks, technical
documentation manuals, specialized journals, subject-field related
books, etc.) written in specialized languages•a number of existing
terminologies in various subject fields (e.g. technical dictionaries,
specialized glossaries, vocabularies, etc. in printed or electronic

>>>>    The two criteria below should be added to those in
        section 1.2, to avoid contradictory criteria being applied,
        and removed from section 1.3.

Recommendation. A recommendation and support of a specialized
authority (such as a standards organization, governmental body,
linguistic institution, or cultural organization) •

Other considerations •
 - the number of speakers of the language community•
 - the recognized status of the language in one or more countries•
 - the support of the request by one or more official bodies

Collective codes. ISO 639-1 does not use collective codes. If these
are necessary the alpha-3 code shall be used.

2. Choice of new language codes

Language codes consist of the following 26 letters of the Latin
alphabet in lower case with no diacritical marks or modified
characters: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s,
t, u, v, w, x, y, z.•ISO 639-2 uses three alphabetic characters, and
ISO 639-1 uses two alphabetic characters.•

Codes need not be abbreviations for the language as they are intended
to serve as an arbitrary device to identify a given language or group
of languages. Mnemonicity of codes is striven for, but this may not
always be possible or appropriate.•An effort is made to derive a
language code from a language's name for itself, when possible. For
historical reasons, some codes may be based on the name of a language
in English. •

>>>>    Simplify to:

        "For historical reasons, some codes reflect vernacular
        language names in English; others reflect language names in
        English. Because of this there are also some variant codes."

There are 23 language names in ISO 639-2 that have variant codes, one
for bibliographic applications, the other for terminological
applications. This was because of established usage in national and
international bibliographic databases which employed codes based on
English language forms of names. •

>>>>    What it doesn't say, and what the web page at least (and
        most likely the ISO 639-2 standard in revision) should say is
        that there are also a smaller number of variant codes for
        bibliographic applications. These are the ones that are
        changing from the MARC codes to different ones in ISO 639-2
        (cam to khm for Khmer etc). These are NOT currently
        documented anywhere, and should be to avoid further
        confusion. Certainly older versions of MARC list these
        "new variant bibliographic codes" and possibly the MARC21
        documentation also does (could you clarify that?)

>>>>    I never quite understood what the gain was in changing
        existing practice in libraries for a few codes, while
        permitting some other variant codes.

        Just as a historical note, considering that the use of two
        codes was a major factor in ensuring negative votes on
        ISO 639-2, why did the library community (I asssume) resist
        changing some codes, but agreed to changing others (adding
        some cost to library users worldwide?)

New language codes shall be based on the vernacular form of name
unless •another language code is requested by the country or
countries using the language or the sponsor submitting the request;
 - if the vernacular cannot be determined; or •
 - if a suitable code is not available In the latter two cases, an
   English form of name may be used for to derive the language code.•

>>>>    That paragraph, with its two bullet points could be
        simplified. Just as I suggst above that the two sentences

        "For historical reasons, some codes reflect vernacular
        language names in English; others reflect language names in
        English. Because of this there are also some variant codes."

        would be simpler than that supplied, it would be simpler
        (and would allow more scope to do the same thing, but to
        allow either situation where the case demanded it) if that
        paragraph were substituted by:

        "New language codes will try and use elements from
        the vernacular form, or the English form, if possible, for
        reasons of mnemonicity."

        There are also going to be occasions when likely combinations
        for vernacular or English mnemonicity are not possible as the
        likely codes are already used. That will affect the codes as
        much as (sometimes more than) any desire for mnemonicity.

A language code already in ISO 639-2/T which is based on the English
form of the name shall not be changed even if the vernacular form is
determined and/or added to ISO 639-1. This is to ensure continuity
and stability and to prevent the proliferation of multiple or
alternative codes. •

>>>>    That paragraph is superfluous, and should be deleted, as
        section 3. below sums this up more clearly, and without
        complications (the additional points above do not add to
        those below).

>>>>    There is nothing gained from refering to explicitly to
        ISO 639-2/T: this applies equally to any codes anywhere in
        either part of ISO 639. English names are used all over, as
        often they are identical to the vernacular names, and/or the
        codes are.

A prefix is not regarded as part of the language name for purposes of
assigning a code (e.g. Swahili is language name, although "KiSwahili"
is often used).

>>>>    It may be worth adding a note that such prefixes are
        particularly common in relation to names of African

3. Changes of existing language codes

To ensure continuity and stability in support of online retrieval
from large databases built over many years, codes shall not be
changed. •

Where codes have been changed or discontinued in the past, the old
codes shall not be reassigned. •

Language codes shall not be changed if the conventional name of a
language is changed. However, language names associated with codes
may be changed. •

Variant forms of a language name may be included in the entry,
separated by a semicolon in the future. No effort will be made by the
Registration Authorities to collect those variants that were
previously not included. •

The MARC Code List for Languages maintains variant names of languages
and may be used as a reference source.

>>>>    The comment in section 2 above about MARC codes is also
        relevant here, though there is no need to change the last
        paragraph of section 3.

4. Relationship between ISO 639-1 to ISO 639-2

In development of ISO 639-2 there was a principle that a code in the
alpha-3 list would include the 2 characters from the alpha-2 where
possible. An exception was the alternative codes, where longstanding
and widespread existing usage of bibliographic codes did not permit
this. •

New codes introduced in ISO 639-1 that are already included in ISO
639-2 should follow this principle. If the vernacular form had not
been used in ISO 639-2/T, the ISO JAC will attempt establish an
alpha-2 code with two letters in common with the alpha-3 code when
possible. •

>>>>    The above two paragraphs could be much simplified by saying

        "Where possible, attempts will be made to ensure that
        2 letters are common between the codes in ISO 639-1 and ISO
        639-2." A brief example (e.g. Bosnian? Nynorsk?) might also
        help to explain what is being described here.

>>>>    Again, there should be no explicit reference to
        ISO 639-2/T. It applies equally to any codes anywhere in
        either part of ISO 639. English names are used all over, as
        often they are identical to the vernacular names, and/or the
        codes are.

ISO 639-1 shall be a subset of ISO 639-2. •

>>>>    That should have been stated before, above section 1.3, to
        avoid the comment that was made there.

>>>>    NB: the following paragraphs are tautologous, and also make
        assumptions about what other users of the standards will do.
        A suggested simpler version (in "quotes") follows these
        paragraphs in order to avoid these problems.

New codes will no longer be added to ISO 639-1 after the publication
of a revised standard. •

A language code already in ISO 639-2 at the point of freezing ISO
639-1 shall not later be added to ISO 639-1.

This is to ensure consistency in usage over time, since users are
directed in Internet applications to employ the alpha-3 code when an
alpha-2 code for that language is not available.•

>>>>    That reflects a draft revision of RFC 1766, although a likely
        and logical outcome of that draft. But ISO standards
        practices should not be tied to unapproved drafts, whether of
        draft ISO standards, or draft de facto standards, such as

New language codes may be considered for inclusion in both parts or
in ISO 639-2 only. If request is to add to ISO 639-1 it must also be
added to ISO 639-2 and satisfy the stated criteria.

>>>>    Why not leave it up to the JAC (or RAs?) to decide whether it
        should be only in ISO 639-2, or also in ISO 639-1 too?

>>>>    I suggest that the above paragraphs

        "New codes will no longer be added to ISO 639-1 ...
        added to ISO 639-2 and satisfy the stated criteria" be
        simplified to:

        "New 2-letter codes will no longer be added to ISO 639-1
        after it is published, when it would replace the current ISO
        639. From that date, only 3-letter codes would be added to
        ISO 639-2.

        "While ISO 639-1 is still in draft form, any proposed new
        codes for ISO 639-1 must also be added to ISO 639-2 and
        satisfy the stated criteria for ISO 639-2."

>>>>    In addition, one should not presume too much what will happen
        with a standard being developed. I imagine that ISO 639-1
        will precede smoothly towards publication, but there is a
        vote and a ISO/TC37/SC2 meeting yet.

See also Rules of procedure for conducting business (ISO 639/JAC N2R).


ISO 639 Joint Advisory Committee Home Page -
ISO 639-2 Registration Authority Home Page -
ISO 639-1 Registration Authority Home Page (not yet active)

<Picture>Library of Congress
Comments: [log in to unmask] (2/10/00)

Best regards

John Clews

John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: +44 1423 888 432; fax: + 44 1423 889061;
Email: [log in to unmask]

Committee Chair of  ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of CEN/TC304: Information and Communications
 Technologies: European Localization Requirements
Committee Member of TS/1: Terminology (UK national member body of
 ISO/TC37: Terminology)
Committee Member of the Foundation for Endangered Languages;
Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets

