The discussion of what is language vs what is dialect tends to come up in
any number of fora. The lower the level of expertise among the participants,
the simpler the solution tends to be. I don't think that our group has a
sufficiently low level of expertise to come up with a simple answer ...!!
I think that all of John's points are valid as "weighted arguments", but
that all have problems.
(a) "an established orthography" : This obviously does not apply to
non-written languages. It also does not work for a large number of languages
with no normalized orthography in the "European" sense. There may be
orthographic principles with a lot of "individual freedom". There are also
languages with competing orthographies, and even scripts. (And Chinese is a
big problem in this respect.) (I have a book in Serbian in Latin script; it
is a dictionary, and the alphabetical order of the words in Latin script is
that of the Cyrillic alphabet. The book states itself that it is in Serbian,
and I don't think that we need to question that.) Northern Sami has three
distinctly different orthographies: "old orthography in Norway and Sweden",
"old orthography in Finland", "new orthography". Actually,
orthography/writing system/script will be a feature that needs "additional
(b) "a separate usage" : Well -- I don't know what you mean, and it is
difficult to comment. Certainly on one level there are a number of "usages"
within any language.
(c) "a separate language name" : This is an important point. However, it
doesn't always work. There are many exceptions, e.g.: (1) Speakers of
Northern Sami and Southern Sami both say that they speak Sami. It is experts
and people on the "outside" that have needed to assign distinguishing names
to the distinctly different languages (with quite different orthographical
principles for one thing). (2) Our Nigerian au pair gives the name of her
town when asked which language she speaks. When "pushed" she says she speaks
Yoruba, just a little differently from the people in the next town, who also
speak Yoruba. Sometimes they understand eachother, sometimes not. This is
actually not very different from the situation with the Nordic languages,
just that we (for historical reasons) don't think of the "umbrella language"
Nordic as a language.
"(d) a body of works using that orthography over a significant period of
time" : Also valid, I guess, but very difficult to use in borderline cases.
My main point: Using John's criteria we can distinguish easily between
"language" and "dialect" in those cases where we don't really have that much
problem in the first place. For distinguishing borderline cases we still
don't have clear criteria.
Håvard Hjulstad mailto:[log in to unmask]
NO-1430 Ås, Norway
tel: +47-64944233 & +47-64963684
From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]]On Behalf
Of John Clews
Sent: 23. januar 2002 19:43
To: [log in to unmask]
Subject: Christian's email
I think that two things are being confused.
Hopefully the criteria proposed below will provide a means of
Talking principally of written languages, and ignoring spoken
languages, helps solve this problem, and this is the approach taken
in both parts of ISO 639.
Using the proposed criteria, it is possible to distinguish
1. Separate languages, which have
(a) an established orthography,
(b) a separate usage,
(c) a separate language name and
(d) a body of works using that orthography over a significant
period of time, against
2. Dialects of any of the above, which don't have all four of
Critera (a) - (d).
Taking those criteria, this works so that
1. Bosnian, Slovak, Nynorsk, Bokmaal, and Walloon all fit criteria
(a) - (d) while (a) - (d) do not apply to group 2 below.
2.1 What you describe as enUS, enAU, enNZ, enUK etc., are all
described as English by their users, and it is difficult to pick
out even any language variants from a short sample. Even for
English as used in the US, the major differences are only some of
spelling (color/colour etc) and usage (carpark/parking lot) and
there is no mutual unintelligibility.
Similarly, what you describe as deAT, deDE, deCH, etc., are all
described as German by their users, and it is difficult to pick
out even any language variants from a short sample. Even for
German as used in Switzerland, the major differences are only some
of spelling (use of ESSZET/SHARP ESS or not) and usage
(Kartoffel/Erdapfel, etc) and there is no mutual
> not to mention Chinese, French, etc.
For Chinese, users of Hakka Chinese, Mandarin Chinese, Cantonese
Chinese etc. all think of themselves as writing Chinese, and as
Chinese people. NB: it is normal practice in the People's
republic of China to subtitle TV historic dramas etc, so that the
drama can be followed in whatever part of China viewers are
watching. They are reading Chinese, even if they speak the same
written words using different pronunication and different
synonyms which predominate in their own (very large) areas.
Criteria (a) - (d) do not apply here.
French is a slightly different kettle of fish, but the same
criteria apply. There are various languages of France, some of
which have dialects. I would refer you to the JAC document N19
(February 2002) which lists several different language families.
It lists the main related _languages_ of metropolitan France as
Franco-provencal, Occitan, and French, and also lists various
dialects of each. Criteria (a) - (d) apply to each of
Franco-provencal, Occitan, and French.
However, criteria (a) - (d) do not apply to each of the dialects
listed (see below), though they do apply to at least Walloon.
NB: There is some work to do here, in both ISO 639-1 and ISO 639-2.
Current Occitan and current Franco-provencale needs separate codes,
as they currently share only one code. Older provencale is not used
currently, but has a large written repertoire, and should retain the
code it has. Occitan is much less influenced by Italian than is
For current languages, there should be three codes for three
languages (Occitan, Franco-Provencale and French). Users may need
guidance on distinguishing Occitan and Franco-Provencale, which may
be done by providing links to sample texts in those languages.
I have not yet looked into the language/dialect status of linguistic
entities related to Occitan (JAC N19 lists Gascon, Languedocian,
Provencal, Auvergnat-Limousin, Alpin Dauphinois) but the (a) - (d)
criteria should be useful in sorting them out.
Similarly, I have not yet looked into the language/dialect status of
linguistic entities related to French (JAC N19 lists as langues d'oil
the entities Franc-Comtois, Walloon, Picard, Norman,
Poetevin-Saintongeais), Bourguignon-Morvandiau, and Lorraine). Again,
the (a) - (d) criteria should be useful in sorting them out.
Walloon is listed in 1, not in 2, as it meets criteria (a) - (d).
The JAC's recent decision on Walloon also fits in with this.
But anyway using (a) to (d) as criteria should enable the JAC to
apply consistent benchmarks that also fit in with existing practice
of ISO 639, ISO 639-2 and the various registrations already made both
both RAs and the JAC.
Are there any problems with that? I'd be glad to see comments.
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Email: [log in to unmask]
tel: +44 1423 888 432;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes