This question is by no means simple; and I don't have the answer! People's languages and the designations for those languages form important parts of their identity. A linguist could have told us that the Norwegian language(s) is "just" a dialect of the "Nordic language", and the identifier could just be "noc-NO" or something (and Danish "noc-DK", etc). He would have been fairly correct from a "purely linguistic" point of view. But very few people in this corner of the world would buy it. Our linguist also wouldn't be very incorrect if his analysis was that Norwegian and Danish are just dialectal variants of Swedish (since Swedish has more speakers than the other two). Let us designate them "sv-NO" and "sv-DK". But I don't think anyone here would buy that perfectly plausible linguistic statement. May be if there had been one "generic" designation for Catalan+Valencian+Balearic it could have worked. But "Cavaba" doesn't exist, neither as designation for a geographical area nor for a linguistic unit. There is no clear "rules" for language borders. Some "adjacent" languages are close, others are further apart. No-one should start localizing into a language just because there exists an ISO 639 identifier. I see two important points: (1) We need clear and well structured meta-information connected to each item in all code tables. The description of this will need to go into the new "part 4" that I have circulated a first proposal for. (2) Not all applications need (or want) the same level of granularity. The fact that an identifier exists for some (sub)unit doesn't mean that it is practical for any application to use it. As to Valencian: We cannot include Valencian as an "alternate name" for Catalan. We COULD possibly have encoded "Cavaba" as a linguistic unit and given Catalan and Valencian and Balearic as "alternate names". But that isn't the case. I think we need to hold any decision for the time being. Hopefully we will find a good place in a new description format for this kind of information. As to English or all the Englishes: It is just a "random" occurrence that they are all called English. We could just as well had five or fifty "English languages". And I am sure that speakers of "Amlish" and "Uklish" alike would find it a poor solution to have "eng"/"en" as identifier. But the case for Valencian isn't like the English case; it is more like oposite. As far as I know (I may be wrong!) Moldavian and Romanian are even closer from a linguistic point of view. These two languages are two languages primarily because the two governments have decided that they are. (I know of course the historical differences in writing system.) I wish I had a clear and stubborn opinion about this ... Havard ------------------------- Havard Hjulstad mailto:[log in to unmask] Solfallsveien 31 NO-1430 As, Norway tel: +47 64963684 & +47 64944233 mob: +47 90145563 http://www.hjulstad.com/havard/ ------------------------- -----Original Message----- From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]]On Behalf Of Peter Constable Sent: 31. mars 2003 17:29 To: [log in to unmask] Subject: Re: AW: A question about a language (fwd) Milicent Wewerka wrote on 03/17/2003 06:56:42 AM: > Regarding Michael's message (below), I think this is an important point. > I don't think that the view of the speakers of the language is the only > issue... The language codes > are not some abstract intellectual exercise; they are applied in the > real world. I agree. If we add 2- or 3-letter IDs for things that really are not different, but are only referred to by different names, then we create potential for problems in a number of areas: - software vendors will get the impression that they need to support distinct implementations for these things, when they do not (and, btw, when they find out that they have expended resources unnecessesarily, they'll get annoyed at ISO) - larger content providers will face a conundrum in publishing: they have content that is intended for the entire community, but they are forced to decide whether to tag it as Catalan or as Valencian; they'll end up having to duplicate the content and have two versions that are identical except for the way tagged, and that will result in increased costs for their operations - some content will be inconsistently tagged: there will be content that gets tagged one way and other content that gets tagged another way; authors will be confused about which to use; users will similarly be confused, or will miss out on some of the content they were looking for - cataloguers will face a conundrum about how to catalog content that can serve both sub-communities (the issue that Milicent pointed out) If there were some social or political circumstances that meant there were two very distinct cultural identities that meant that content targeted at one community would generally not be suitable for the other, then *perhaps* that might warrent two different language identifiers (though I wouldn't make that a general rule). But just because one portion of the speaker community refers to themselves as "Valencian", that alone is not a sufficient basis for asserting a distinct language. (If the UK had vowed to veto a UN resolution and Americans started asserting that they ate "freedom muffins" rather than "English muffins", and similarly that they spoke "American" rather than "English", that wouldn't provide a basis for adding a new language identifier for "American".) - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485