Granularity (a) - (d) [re: Christian's email]
I am using the word granularity to describe where divisions occur
between languages and dialects, which has always been an important
consideration in the various parts of ISO 639 and the work of its RAs.
I hope that this is a legitimate use of the word granularity.
Sometimes this is a fine distinction. This email is an attempt to get
some reasonably objective criteria into this issue.
I would be extremely grateful for responses to this, as I think
Christian has raised a very important issue.
Most importantly, Christian has suggested that it is important to
distinguish languages from dialects. NB: that has prompted me to
identify 4 criteria (a) - (d) which I think can be consistently
applied both to existing codes, and potential new codes, fairly
simply, which will provide a clear level of granularity.
If a linguistic entity meets _all_ of criteria (a) - (d) it should be
accepted as a separate language, and ideally allocated a code, while
if less than all four are met, the linguistic entity in question
probably should not be accepted as a separate language, or allocated
If JAC members could look at the discussion below, and think of any
areas where there are cases where there are legitimate languages
which do not meet all of (a) - (d), I'd be grateful.
Christian's email was as follows (slightly abbreviated):
> -----Original Message-----
> From: ISO JAC Voting Member List [mailto:[log in to unmask]]On Behalf Of
> Christian Galinski
> Sent: 18. januar 2002 00:13
> To: [log in to unmask]
> Subject: [JACVOTE] AW: [JACVOTE] AW: [JACVOTE] Ballot: Walloon
> May I reformulate my arguments:
> I did not approve inclusion in ISO 639-1 for the following reasons:
> If we include Walloon, we also have to include all kinds of variants for
> major languages, like
> enUS, enAU, enNZ, enUK etc.
> deAT, deDE, deCH, etc.
> not to mention Chinese, French, etc...
> On the other hand, we have cases, like Bosnian, Slowakian, etc. If
> I am wrong, please correct me.
I think you are wrong, so I am correcting you :-)
Let me explain why. I think that two things are being confused, which
are indeed often confused. My suggestion is that the two can in fact
be easily distinguished by applying criteria (a) - (d) listed below.
Talking principally of written languages, and ignoring spoken
languages, helps solve this problem, which fits in with the
conventions and the principal scope of ISO 639.
Separate linguistic entities which have
(a) an established orthography,
(b) a separate usage,
(c) a separate name, and
(d) a body of works using that orthography over a significant
period of time
all fit the class "languages."
Separate linguistic entities which do not have _all_four_
of (a) - (d) above all fit the class "dialects."
For specific examples, taking those criteria, this works so that
1. Bosnian, Slovak, Walloon all fit criteria (a) - (d) while
(a) - (d) do not apply to group 2. Sections 2.1 - 2.4 give more
2.1 What you describe as enUS, enAU, enNZ, enUK etc., are all
described as English by their users, and it is difficult to pick
out even any language variants from a short sample of written
text. Even comparing English as used in the US and the UK, the
major differences are only some of spelling (color/colour etc)
and usage (carpark/parking lot) and there is no mutual
unintelligibility, and indeed some text samples - particularly an
academic text - could include a lot of words before there is
anything to indicate whether its provenance is British, American,
Canadian, Australian, New Zealand, Caribbean, West African,
Southern African, Indian, Pakistani, Bangladeshi, Hong Kong,
Singapore, etc., and even then guesses would be subjective.
Therefore, what you describe as enUS, enAU, enNZ, enUK etc., are
NOT separate languages, there is only English in the cases you
2.2 Similarly, what you describe as deAT, deDE, deCH, etc., are all
described as German by their users, and it is difficult to pick
out even any language variants from a short sample. Even for
German as used in Switzerland, the major differences are only some
of spelling (use of ESSZET/SHARP ESS or not) and usage
(Kartoffel/Erdapfel, etc) and there is no mutual
Therefore, what you describe as deAT, deDE, deCH, etc., are
NOT separate languages, there is only German in the cases you
> not to mention Chinese, French, etc.
For Chinese, users of Hakka Chinese, Mandarin Chinese, Cantonese
Chinese etc. all think of themselves as writing Chinese, and as
Chinese people. NB: it is normal practice in the People's
republic of China to subtitle TV historic dramas etc, so that the
drama can be followed in whatever part of China viewers are
watching. They are reading Chinese, even if they speak the same
written words using different pronunication and different
synonyms which predominate in their own (very large) areas.
All of criteria (a) - (d) do not apply here.
Therefore, Hakka Chinese, Mandarin Chinese, Cantonese Chinese are
NOT separate languages, there is only Chinese in the cases
In those cases, there may be a case for dialect codes (which is
what the assigned IANA codes are (zh-hakka etc), but that is
currently not on the agenda for the ISO 639 Joint Advisory
Committee, though other fora may look at this issue.
French is a slightly different kettle of fish, but the same
(a) - (d) criteria apply. There are various languages of France,
some of which have dialects. I would refer you to the JAC
document N19 (February 2002) which lists several different
language families. It lists the main related _languages_ of
metropolitan France as (i) Franco-provencal, (ii) Occitan, and
(iii) French, and also lists various dialects of each. Criteria
(a) - (d) apply to each of Franco-provencal, Occitan, and French,
and there is a case for others related to these to be regarded as
languages in addition - Walloon being one example considered
However, criteria (a) - (d) do not apply to each of the dialects
listed (see below), though they do apply to at least Walloon.
NB: There is some work to do here, in both ISO 639-1 and ISO 639-2,
to assess how other languages/dialects work out in the linguistic
entities listed in JAC N19.
The entities "current Occitan" and "current Franco-provencale" each
need separate codes, as they currently share only one code. Older
provencale is not used currently, but has a large written repertoire,
and should retain the code it has. Occitan is much less influenced by
Italian than is Franco-provencale.
For current languages, there should be three codes for three
languages (Occitan, Franco-Provencale and French). Users may need
guidance on distinguishing Occitan and Franco-Provencale, which may
be done by providing links to sample texts in those languages.
Also there needs to be some guidance (and there is already a separate
code) for Old Procencal (to 1500).
I have not yet looked into the language/dialect status of linguistic
entities related to Occitan (JAC N19 lists Gascon, Languedocian,
Provencal, Auvergnat-Limousin, Alpin Dauphinois) but the (a) - (d)
criteria should be useful in sorting them out.
Similarly, I have not yet looked into the language/dialect status of
linguistic entities related to French (JAC N19 lists as langues d'oil
the entities Franc-Comtois, Walloon, Picard, Norman,
Poetevin-Saintongeais), Bourguignon-Morvandiau, and Lorraine). Again,
the (a) - (d) criteria should be useful in sorting them out.
Walloon is listed in 1, not in 2, as it meets criteria (a) - (d).
The JAC's recent decision on Walloon in relation to ISO 639-2 also
fits in with this.
But anyway using (a) to (d) as criteria should enable the JAC to
apply consistent benchmarks that also fit in with existing practice
of ISO 639, ISO 639-2 and the various registrations already made both
both RAs and the JAC.
Are there any problems with that? I'd be glad to see comments.
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
Email: [log in to unmask]
tel: +44 1423 888 432;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes