In message  <[log in to unmask]>
[log in to unmask] writes:

> I think that all of John's points are valid as "weighted arguments", but
> that all have problems.
> (a) "an established orthography" : This obviously does not apply to
> non-written languages. It also does not work for a large number of
> languages with no normalized orthography in the "European" sense.
> There may be orthographic principles with a lot of "individual
> freedom". There are also languages with competing orthographies, and
> even scripts. (And Chinese is a big problem in this respect.)

Which particular problems? Criteria (a) - (d) get around that.

> (I have a book in Serbian in Latin script; it is a dictionary, and
> the alphabetical order of the words in Latin script is that of the
> Cyrillic alphabet. The book states itself that it is in Serbian, and
> I don't think that we need to question that.)

Agreed - Serbian is rather well established as a language.

> Northern Sami has three distinctly different orthographies: "old
> orthography in Norway and Sweden", "old orthography in Finland", "new
> orthography". Actually, orthography/writing system/script will be a
> feature that needs "additional coding."

But orthography/writing system/script does not need to be coded as
language entities. ISO 15924: Codes for representation of names of
scripts, when published, will provide codes for these. It doesn't
have to be part of ISO 639, though it's extremely logical for
ISO 15924 to be used together _with_ ISO 639 just as
ISO 3166 is used together _with_ ISO 639.

> (c) "a separate language name" : This is an important point.
> However, it doesn't always work. There are many exceptions, e.g.:
> (1) Speakers of Northern Sami and Southern Sami both say that
> they speak Sami...

Covered in previous email.

> (2) Our Nigerian au pair gives the name of her town when asked
> which language she speaks.

Which seems to be common in much of Africa. However, using
geographical criteria applies everywhere anyway, though usually on a
larger scale: German is what is spoken in Germany, French is what is
spoken in France, English is what is spoken in England, etc :-)

> When "pushed" she says she speaks Yoruba, just a little differently
> from the people in the next town, who also speak Yoruba. Sometimes
> they understand each other, sometimes not.

Which is also true of varieties of English, even in the United
Kingdom, still if you go up various valleys, but despite considerable
degrees of mutual incomprehensibility in _spoken_ English, there is
nos significant incomprehensibility in _written_ English, except as
affected by illiteracy.

The question "what do you read?" is more significant than the
question "what do you speak?" in this instance. "What do you read?"
is what ISO 639 is largely about, though it also needs to consider
how it will interface with systems that deal with "what do you

> This is actually not very different from the situation with the
> Nordic languages, just that we (for historical reasons) don't think
> of the "umbrella language" Nordic as a language.

Exactly. Sometimes it's the umbrellas which is more significant, and
sometimes it's what shelters under the umbrella. That's just the
nature of nomenclature and classification.

I still think that criteria (a) - (d) are pretty robust. Have you
some more borderline language entities to test against them?

Best regards

John Clews

John Clews,
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Email: [log in to unmask]
tel: +44 1423 888 432;

Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes