This question is by no means simple; and I don't have the answer!

People's languages and the designations for those languages form important
parts of their identity. A linguist could have told us that the Norwegian
language(s) is "just" a dialect of the "Nordic language", and the identifier
could just be "noc-NO" or something (and Danish "noc-DK", etc). He would
have been fairly correct from a "purely linguistic" point of view. But very
few people in this corner of the world would buy it. Our linguist also
wouldn't be very incorrect if his analysis was that Norwegian and Danish are
just dialectal variants of Swedish (since Swedish has more speakers than the
other two). Let us designate them "sv-NO" and "sv-DK". But I don't think
anyone here would buy that perfectly plausible linguistic statement.

May be if there had been one "generic" designation for
Catalan+Valencian+Balearic it could have worked. But "Cavaba" doesn't exist,
neither as designation for a geographical area nor for a linguistic unit.

There is no clear "rules" for language borders. Some "adjacent" languages
are close, others are further apart. No-one should start localizing into a
language just because there exists an ISO 639 identifier.

I see two important points: (1) We need clear and well structured
meta-information connected to each item in all code tables. The description
of this will need to go into the new "part 4" that I have circulated a first
proposal for. (2) Not all applications need (or want) the same level of
granularity. The fact that an identifier exists for some (sub)unit doesn't
mean that it is practical for any application to use it.

As to Valencian: We cannot include Valencian as an "alternate name" for
Catalan. We COULD possibly have encoded "Cavaba" as a linguistic unit and
given Catalan and Valencian and Balearic as "alternate names". But that
isn't the case. I think we need to hold any decision for the time being.
Hopefully we will find a good place in a new description format for this
kind of information.

As to English or all the Englishes: It is just a "random" occurrence that
they are all called English. We could just as well had five or fifty
"English languages". And I am sure that speakers of "Amlish" and "Uklish"
alike would find it a poor solution to have "eng"/"en" as identifier. But
the case for Valencian isn't like the English case; it is more like oposite.
As far as I know (I may be wrong!) Moldavian and Romanian are even closer
from a linguistic point of view. These two languages are two languages
primarily because the two governments have decided that they are. (I know of
course the historical differences in writing system.)

I wish I had a clear and stubborn opinion about this ...


Havard Hjulstad    mailto:[log in to unmask]
  Solfallsveien 31
  NO-1430  As, Norway
  tel: +47 64963684  &  +47 64944233
  mob: +47 90145563

-----Original Message-----
From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]]On Behalf
Of Peter Constable
Sent: 31. mars 2003 17:29
To: [log in to unmask]
Subject: Re: AW: A question about a language (fwd)

Milicent Wewerka wrote on 03/17/2003 06:56:42 AM:

> Regarding Michael's message (below), I think this is an important point.
>  I don't think that the view of the speakers of the language is the only
> issue... The language codes
> are not some abstract intellectual exercise; they are applied in the
> real world.

I agree. If we add 2- or 3-letter IDs for things that really are not
different, but are only referred to by different names, then we create
potential for problems in a number of areas:

- software vendors will get the impression that they need to support
distinct implementations for these things, when they do not (and, btw, when
they find out that they have expended resources unnecessesarily, they'll
get annoyed at ISO)

- larger content providers will face a conundrum in publishing: they have
content that is intended for the entire community, but they are forced to
decide whether to tag it as Catalan or as Valencian; they'll end up having
to duplicate the content and have two versions that are identical except
for the way tagged, and that will result in increased costs for their

- some content will be inconsistently tagged: there will be content that
gets tagged one way and other content that gets tagged another way; authors
will be confused about which to use; users will similarly be confused, or
will miss out on some of the content they were looking for

- cataloguers will face a conundrum about how to catalog content that can
serve both sub-communities (the issue that Milicent pointed out)

If there were some social or political circumstances that meant there were
two very distinct cultural identities that meant that content targeted at
one community would generally not be suitable for the other, then *perhaps*
that might warrent two different language identifiers (though I wouldn't
make that a general rule). But just because one portion of the speaker
community refers to themselves as "Valencian", that alone is not a
sufficient basis for asserting a distinct language. (If the UK had vowed to
veto a UN resolution and Americans started asserting that they ate "freedom
muffins" rather than "English muffins", and similarly that they spoke
"American" rather than "English", that wouldn't provide a basis for adding
a new language identifier for "American".)

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485