Dear all,

The discussion of what is language vs what is dialect tends to come up in
any number of fora. The lower the level of expertise among the participants,
the simpler the solution tends to be. I don't think that our group has a
sufficiently low level of expertise to come up with a simple answer ...!!

I think that all of John's points are valid as "weighted arguments", but
that all have problems.

(a) "an established orthography" : This obviously does not apply to
non-written languages. It also does not work for a large number of languages
with no normalized orthography in the "European" sense. There may be
orthographic principles with a lot of "individual freedom". There are also
languages with competing orthographies, and even scripts. (And Chinese is a
big problem in this respect.) (I have a book in Serbian in Latin script; it
is a dictionary, and the alphabetical order of the words in Latin script is
that of the Cyrillic alphabet. The book states itself that it is in Serbian,
and I don't think that we need to question that.) Northern Sami has three
distinctly different orthographies: "old orthography in Norway and Sweden",
"old orthography in Finland", "new orthography". Actually,
orthography/writing system/script will be a feature that needs "additional

(b) "a separate usage" : Well -- I don't know what you mean, and it is
difficult to comment. Certainly on one level there are a number of "usages"
within any language.

(c) "a separate language name" : This is an important point. However, it
doesn't always work. There are many exceptions, e.g.: (1) Speakers of
Northern Sami and Southern Sami both say that they speak Sami. It is experts
and people on the "outside" that have needed to assign distinguishing names
to the distinctly different languages (with quite different orthographical
principles for one thing). (2) Our Nigerian au pair gives the name of her
town when asked which language she speaks. When "pushed" she says she speaks
Yoruba, just a little differently from the people in the next town, who also
speak Yoruba. Sometimes they understand eachother, sometimes not. This is
actually not very different from the situation with the Nordic languages,
just that we (for historical reasons) don't think of the "umbrella language"
Nordic as a language.

"(d) a body of works using that orthography over a significant period of
time" : Also valid, I guess, but very difficult to use in borderline cases.

My main point: Using John's criteria we can distinguish easily between
"language" and "dialect" in those cases where we don't really have that much
problem in the first place. For distinguishing borderline cases we still
don't have clear criteria.

Best regards,

Håvard Hjulstad    mailto:[log in to unmask]
  Solfallsveien 31
  NO-1430  Ås, Norway
  tel: +47-64944233  &  +47-64963684
  mob: +47-90145563

-----Original Message-----
From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]]On Behalf
Of John Clews
Sent: 23. januar 2002 19:43
To: [log in to unmask]
Subject: Christian's email

I think that two things are being confused.
Hopefully the criteria proposed below will provide a means of
overcoming this.

Talking principally of written languages, and ignoring spoken
languages, helps solve this problem, and this is the approach taken
in both parts of ISO 639.

Using the proposed criteria, it is possible to distinguish

1. Separate languages, which have
   (a) an established orthography,
   (b) a separate usage,
   (c) a separate language name and
   (d) a body of works using that orthography over a significant
   period of time, against

2. Dialects of any of the above, which don't have all four  of
   Critera (a) - (d).

Taking those criteria, this works so that


1. Bosnian, Slovak, Nynorsk, Bokmaal, and Walloon all fit criteria
   (a) - (d) while (a) - (d) do not apply to group 2 below.


2.1 What you describe as enUS, enAU, enNZ, enUK etc., are all
    described as English by their users, and it is difficult to pick
    out even any language variants from a short sample. Even for
    English as used in the US, the major differences are only some of
    spelling (color/colour etc) and usage (carpark/parking lot) and
    there is no mutual unintelligibility.

    Similarly, what you describe as deAT, deDE, deCH, etc., are all
    described as German by their users, and it is difficult to pick
    out even any language variants from a short sample. Even for
    German as used in Switzerland, the major differences are only some
    of spelling (use of ESSZET/SHARP ESS or not) and usage
    (Kartoffel/Erdapfel, etc) and there is no mutual

> not to mention Chinese, French, etc.

    For Chinese, users of Hakka Chinese, Mandarin Chinese, Cantonese
    Chinese etc. all think of themselves as writing Chinese, and as
    Chinese people. NB: it is normal practice in the People's
    republic of China to subtitle TV historic dramas etc, so that the
    drama can be followed in whatever part of China viewers are
    watching. They are reading Chinese, even if they speak the same
    written words using different pronunication and different
    synonyms which predominate in their own (very large) areas.
    Criteria (a) - (d) do not apply here.

    French is a slightly different kettle of fish, but the same
    criteria apply. There are various languages of France, some of
    which have dialects. I would refer you to the JAC document N19
    (February 2002) which lists several different language families.
    It lists the main related _languages_ of metropolitan France as
    Franco-provencal, Occitan, and French, and also lists various
    dialects of each. Criteria (a) - (d) apply to each of
    Franco-provencal, Occitan, and French.

    However, criteria (a) - (d) do not apply to each of the dialects
    listed (see below), though they do apply to at least Walloon.

NB: There is some work to do here, in both ISO 639-1 and ISO 639-2.

Current Occitan and current Franco-provencale needs separate codes,
as they currently share only one code. Older provencale is not used
currently, but has a large written repertoire, and should retain the
code it has. Occitan is much less influenced by Italian than is

For current languages, there should be three codes for three
languages (Occitan, Franco-Provencale and French). Users may need
guidance on distinguishing Occitan and Franco-Provencale, which may
be done by providing links to sample texts in those languages.


I have not yet looked into the language/dialect status of linguistic
entities related to Occitan (JAC N19 lists Gascon, Languedocian,
Provencal, Auvergnat-Limousin, Alpin Dauphinois) but the (a) - (d)
criteria should be useful in sorting them out.

Similarly, I have not yet looked into the language/dialect status of
linguistic entities related to French (JAC N19 lists as langues d'oil
the entities Franc-Comtois, Walloon, Picard, Norman,
Poetevin-Saintongeais), Bourguignon-Morvandiau, and Lorraine). Again,
the (a) - (d) criteria should be useful in sorting them out.

Walloon is listed in 1, not in 2, as it meets criteria (a) - (d).
The JAC's recent decision on Walloon also fits in with this.

But anyway using (a) to (d) as criteria should enable the JAC to
apply consistent benchmarks that also fit in with existing practice
of ISO 639, ISO 639-2 and the various registrations already made both
both RAs and the JAC.

Are there any problems with that? I'd be glad to see comments.

Best regards

John Clews

John Clews,
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Email: [log in to unmask]
tel: +44 1423 888 432;

Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes