Dear Rebecca Thanks for putting ISO 639/JAC N3R on the list. It is a significant improvement, and section 3 in particular is very strong and clear. However, some simplification of style, and removal of ambiguities is still needed to avoid the JAC having problems later, and I embed my comments below, either indenting my text by 8 spaces, or else using >>>> in the margin to flag my comments. Unindented text is that of N3R. In particular, there is a need to remove contradictions between criteria (between ISO 639-1 and ISO 639-2), and to remove some things which conflict with what the JAC decided (e.g. on diferent orthographies). I hope that you will find this useful in getting rid of ambiguities, and ensuring that the path of the ISO 639 Joint Advisory Committee runs smoothly, without unnecessary complications. ISO 639/JAC N3R ISO 639 Joint Advisory Committee Working principles for ISO 639 maintenance (8 March 2000) ------------------------------------------------------------------------ The following documents working principles for the maintenance of language codes by the ISO 639 Joint Advisory Committee both in ISO 639-1 (Alpha-2 code) and ISO 639-2 (Alpha-3 code). It repeats some information that is in ISO 639-2:1998 in section 4 (Language codes) and the normative Annex A. >>>> It repeats some information that is in both ISO 639-2, _and_ in the equivalent parts of the section on Language codes, and the normative Annex A of ISO DIS 639-1. It may be worth pointing that out, to avoid any appearance that most things are dictated by ISO 639-2. In addition, it gives further details as to how language code changes that are submitted are considered and how the two parts of ISO 639 are related. ------------------------------------------------------------------------ 1. Definition of new language codes 1.1. Procedures A Registration form is available on the Web for requesting new language codes, which is submitted to the appropriate ISO 639 Registration Authority for consideration.•The Registration Authority will review applications, obtain additional information and/or justification from the submitter, and suggest the assignment of a code when the relevant criteria are met. >>>> What is not explicit in these procedures is how the work is broken down between the two RAs, and the JAC. It may well be most cost effective to state explicitly that the the ISO 639 Joint Advisory Committee will review all applications in consultation with the RAs. 1.2. Criteria for ISO 639-2 >>>> Criteria should be merged into one, to avoid contradictions in applying these criteria (see below), now that it has been agreed that there will be less new codes in ISO 639-1 than in ISO 639-2, keeping just the one more stringent criterion for ISO 639-1 based on documentation. Number of documents. The request for a new language code shall include evidence that one agency holds 50 different documents in the language or that five agencies hold a total of 50 different documents among them in the language. Documents include all forms of material and is not limited to text. • >>>> Given the fact that many codes in the standard do not themselves meet this criteria, it is invidious to impose this on the submitter. Please can you change the sentence "The request for a new language code shall include evidence that one agency holds 50 different documents in the language or that five agencies hold a total of 50 different documents among them in the language" to the sentence "The request for a new language code shall include evidence that there are a substantial number of documents in or about the language concerned." >>>> The following criteria from section 1.3 below are equally valid in section 1.3: Recommendation. A recommendation and support of a specialized authority (such as a standards organization, governmental body, linguistic institution, or cultural organization) • Other considerations • - the number of speakers of the language community• - the recognized status of the language in one or more countries• - the support of the request by one or more official bodies Otherwise there is an inherent contradiction between the criteria for the codes: more codes are permitted in ISO 639-2, but some of the criteria for ISO 639-1 additions would allow more codes (e.g. those under "other considerations" above) which could allow more to be included under ISO 639-1 criteria than ISO 639-2. If this is not resolved, there would be problems in meeting either set of criteria. Collective codes. If the criteria above are not met the language may be assigned a new or existing collective language code. The words languages or other as part of a language name indicates that a language code is a collective one. • Scripts. A single language code is normally provided for a language even though the language is written in more than one script. >>>> This contradicts what the JAC seems to have done - cf. the Bosnian example referred to below. Which languages in ISO 639 (either part) or proposed additions actually follow this now? A standard for script codes is under development by ISO/TC46/SC2, ISO DIS 15924: Codes for the representation of names of scripts. • >>>> For simpler grammar and style, simplify to: ISO DIS 15924 "Codes for the representation of names of scripts" is under development by ISO/TC46/SC2. Dialects. A dialect of a language is usually represented by the same language code as that used for the language. If the language is assigned to a collective language code, the dialect is assigned to the same collective language code. The difference between dialects and languages will be decided on a case-by-case basis.• Orthography. A language using more than one orthography is not given multiple language codes. >>>> Too specific! Orthography was a major factor in determining the status of Bosnian. For simplicity, delete, or say the opposite, e.g. "Orthography. The use of more than one orthography may be significant in determining language status." 1.3. Criteria for ISO 639-1 >>>> Criteria for either part, not for each part, would be preferable. There are some contraditions between the two. Relation to ISO 639-2. Since ISO 639-1 is to remain a subset of ISO 639-2, it must first satisfy the requirements for ISO 639-2 and also satisfy the following. • >>>> This is the first time that it is suggested that ISO 639-1 has a subset relationship to ISO 639-2. There is no such premise previously mentioned, so "Since" cannot begin this sentence. The entities are also confused. For clarity prefer: "Relation to ISO 639-2. The language [entities] coded in ISO 639-1 will remain a subset of those in ISO 639-2." or, more simply: "ISO 639-1 will not provide any codes for languages that are not provided in ISO 639-2." "[For terminological use] it should also satisfy the following criterion:" Documentation. •a significant body of existing documents (specialized texts, such as college or university textbooks, technical documentation manuals, specialized journals, subject-field related books, etc.) written in specialized languages•a number of existing terminologies in various subject fields (e.g. technical dictionaries, specialized glossaries, vocabularies, etc. in printed or electronic form) >>>> The two criteria below should be added to those in section 1.2, to avoid contradictory criteria being applied, and removed from section 1.3. Recommendation. A recommendation and support of a specialized authority (such as a standards organization, governmental body, linguistic institution, or cultural organization) • Other considerations • - the number of speakers of the language community• - the recognized status of the language in one or more countries• - the support of the request by one or more official bodies Collective codes. ISO 639-1 does not use collective codes. If these are necessary the alpha-3 code shall be used. 2. Choice of new language codes Language codes consist of the following 26 letters of the Latin alphabet in lower case with no diacritical marks or modified characters: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z.•ISO 639-2 uses three alphabetic characters, and ISO 639-1 uses two alphabetic characters.• Codes need not be abbreviations for the language as they are intended to serve as an arbitrary device to identify a given language or group of languages. Mnemonicity of codes is striven for, but this may not always be possible or appropriate.•An effort is made to derive a language code from a language's name for itself, when possible. For historical reasons, some codes may be based on the name of a language in English. • >>>> Simplify to: "For historical reasons, some codes reflect vernacular language names in English; others reflect language names in English. Because of this there are also some variant codes." There are 23 language names in ISO 639-2 that have variant codes, one for bibliographic applications, the other for terminological applications. This was because of established usage in national and international bibliographic databases which employed codes based on English language forms of names. • >>>> What it doesn't say, and what the web page at least (and most likely the ISO 639-2 standard in revision) should say is that there are also a smaller number of variant codes for bibliographic applications. These are the ones that are changing from the MARC codes to different ones in ISO 639-2 (cam to khm for Khmer etc). These are NOT currently documented anywhere, and should be to avoid further confusion. Certainly older versions of MARC list these "new variant bibliographic codes" and possibly the MARC21 documentation also does (could you clarify that?) >>>> I never quite understood what the gain was in changing existing practice in libraries for a few codes, while permitting some other variant codes. Just as a historical note, considering that the use of two codes was a major factor in ensuring negative votes on ISO 639-2, why did the library community (I asssume) resist changing some codes, but agreed to changing others (adding some cost to library users worldwide?) New language codes shall be based on the vernacular form of name unless •another language code is requested by the country or countries using the language or the sponsor submitting the request; - if the vernacular cannot be determined; or • - if a suitable code is not available In the latter two cases, an English form of name may be used for to derive the language code.• >>>> That paragraph, with its two bullet points could be simplified. Just as I suggst above that the two sentences "For historical reasons, some codes reflect vernacular language names in English; others reflect language names in English. Because of this there are also some variant codes." would be simpler than that supplied, it would be simpler (and would allow more scope to do the same thing, but to allow either situation where the case demanded it) if that paragraph were substituted by: "New language codes will try and use elements from the vernacular form, or the English form, if possible, for reasons of mnemonicity." There are also going to be occasions when likely combinations for vernacular or English mnemonicity are not possible as the likely codes are already used. That will affect the codes as much as (sometimes more than) any desire for mnemonicity. A language code already in ISO 639-2/T which is based on the English form of the name shall not be changed even if the vernacular form is determined and/or added to ISO 639-1. This is to ensure continuity and stability and to prevent the proliferation of multiple or alternative codes. • >>>> That paragraph is superfluous, and should be deleted, as section 3. below sums this up more clearly, and without complications (the additional points above do not add to those below). >>>> There is nothing gained from refering to explicitly to ISO 639-2/T: this applies equally to any codes anywhere in either part of ISO 639. English names are used all over, as often they are identical to the vernacular names, and/or the codes are. A prefix is not regarded as part of the language name for purposes of assigning a code (e.g. Swahili is language name, although "KiSwahili" is often used). >>>> It may be worth adding a note that such prefixes are particularly common in relation to names of African languages. 3. Changes of existing language codes To ensure continuity and stability in support of online retrieval from large databases built over many years, codes shall not be changed. • Where codes have been changed or discontinued in the past, the old codes shall not be reassigned. • Language codes shall not be changed if the conventional name of a language is changed. However, language names associated with codes may be changed. • Variant forms of a language name may be included in the entry, separated by a semicolon in the future. No effort will be made by the Registration Authorities to collect those variants that were previously not included. • The MARC Code List for Languages maintains variant names of languages and may be used as a reference source. >>>> The comment in section 2 above about MARC codes is also relevant here, though there is no need to change the last paragraph of section 3. 4. Relationship between ISO 639-1 to ISO 639-2 In development of ISO 639-2 there was a principle that a code in the alpha-3 list would include the 2 characters from the alpha-2 where possible. An exception was the alternative codes, where longstanding and widespread existing usage of bibliographic codes did not permit this. • New codes introduced in ISO 639-1 that are already included in ISO 639-2 should follow this principle. If the vernacular form had not been used in ISO 639-2/T, the ISO JAC will attempt establish an alpha-2 code with two letters in common with the alpha-3 code when possible. • >>>> The above two paragraphs could be much simplified by saying "Where possible, attempts will be made to ensure that 2 letters are common between the codes in ISO 639-1 and ISO 639-2." A brief example (e.g. Bosnian? Nynorsk?) might also help to explain what is being described here. >>>> Again, there should be no explicit reference to ISO 639-2/T. It applies equally to any codes anywhere in either part of ISO 639. English names are used all over, as often they are identical to the vernacular names, and/or the codes are. ISO 639-1 shall be a subset of ISO 639-2. • >>>> That should have been stated before, above section 1.3, to avoid the comment that was made there. >>>> NB: the following paragraphs are tautologous, and also make assumptions about what other users of the standards will do. A suggested simpler version (in "quotes") follows these paragraphs in order to avoid these problems. New codes will no longer be added to ISO 639-1 after the publication of a revised standard. • A language code already in ISO 639-2 at the point of freezing ISO 639-1 shall not later be added to ISO 639-1. This is to ensure consistency in usage over time, since users are directed in Internet applications to employ the alpha-3 code when an alpha-2 code for that language is not available.• >>>> That reflects a draft revision of RFC 1766, although a likely and logical outcome of that draft. But ISO standards practices should not be tied to unapproved drafts, whether of draft ISO standards, or draft de facto standards, such as RFCs. New language codes may be considered for inclusion in both parts or in ISO 639-2 only. If request is to add to ISO 639-1 it must also be added to ISO 639-2 and satisfy the stated criteria. >>>> Why not leave it up to the JAC (or RAs?) to decide whether it should be only in ISO 639-2, or also in ISO 639-1 too? >>>> I suggest that the above paragraphs "New codes will no longer be added to ISO 639-1 ... added to ISO 639-2 and satisfy the stated criteria" be simplified to: "New 2-letter codes will no longer be added to ISO 639-1 after it is published, when it would replace the current ISO 639. From that date, only 3-letter codes would be added to ISO 639-2. "While ISO 639-1 is still in draft form, any proposed new codes for ISO 639-1 must also be added to ISO 639-2 and satisfy the stated criteria for ISO 639-2." >>>> In addition, one should not presume too much what will happen with a standard being developed. I imagine that ISO 639-1 will precede smoothly towards publication, but there is a vote and a ISO/TC37/SC2 meeting yet. See also Rules of procedure for conducting business (ISO 639/JAC N2R). ------------------------------------------------------------------------ ISO 639 Joint Advisory Committee Home Page - ISO 639-2 Registration Authority Home Page - ISO 639-1 Registration Authority Home Page (not yet active) ------------------------------------------------------------------------ <Picture>Library of Congress Comments: [log in to unmask] (2/10/00) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Best regards John Clews -- John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG tel: +44 1423 888 432; fax: + 44 1423 889061; Email: [log in to unmask] Committee Chair of ISO/TC46/SC2: Conversion of Written Languages; Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization; Committee Member of CEN/TC304: Information and Communications Technologies: European Localization Requirements Committee Member of TS/1: Terminology (UK national member body of ISO/TC37: Terminology) Committee Member of the Foundation for Endangered Languages; Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets