The Library of Congress faces this issue frequently in our indexing vocabulary, Library of Congress Subject Headings. Normally we try to follow reference sources such as Ethnologue, Voegelin's Classification and index of the world's languages, and other compilations. Naturally these sources don't always agree. We have to make the best judgment we can with the available information. Sometimes the perspective changes over time. A "dialect" may with time be promoted to a "language," or additional information may reveal that a "language" is actually a group of languages. I agree that specific criteria would be nice, but the data won't always be available to measure the differences. Personally, I tend to give more weight to sources that don't have a political interest in the issue.
Milicent Wewerka, Library of Congress
>>> Håvard Hjulstad <[log in to unmask]> 04/01/03 04:56PM >>>
I agree 100 % with you, John (see his message below). What this is about is
NOT the concrete examples (Moldanian vs Romanian, Valencian vs Catalan,
Serbo-Croatian vs Serbian vs Croatian vs Bosnian, English vs English, etc.).
What we need to discuss is the rules and the criteria. I am sorry that very
few JAC members have "dared" throw themselves into the discussion. I tried
to trigger a discussion, but it is obviously very difficult to discuss
principles without getting too focussed on details of the examples.
We have the concepts of "indiviual language", "language group", "language
variant", etc. We have a number of criteria by which to assess what we are
dealing with in each individual case, but we constantly have the same kinds
Some of the criteria are:
- Purely linguistic on the level of phonology and morphology. These are
normally fairly straight-forward to deal with. It would be possible to
"measure" phonological and morphological differences.
- Writing system, including orthographic principles. A high level of
orthographic stability makes it simpler to "count languages". Unfortunately
many orthographies are quite unstable and/or allow for considerable
- Vocabulary. In some cases neighbouring and closely related
languages/variants have had different cultural influences that may weigh
when we are "measuring" the difference.
- Legal or de-facto regulation. Many languages have some sort of legal
"protection", which also needs to be considered.
- Cultural split or unity. I think this is an important factor, but it is
quite difficult to deal with.
I am sure that we cannot come up with a formula that can be used objectively
to determine whether a "speak" (or a "write") is an "individual language".
But we need to put some effort into the question. May be some of our
"individual languages" would end up having "meta-names" as their primary
names, like "Romanian+Moldavian", "Catalan+Valencian+Balear", etc. Both
Ethnologue and Linguasphere have a number of such cases. These "meta-names"
would have to be a separate category, and "real" names would be included in
addition. I am certain that the current list of
identifier+English-name+French-name+indigenous-name needs to be changed.
There are many ways forward. And there are many decisions to be made. Among
them are: (1) How can we improve our criteria for assessing where the
"individual language" boundaries go? (2) Which elements of additional
information are needed to enhance our tables? (Don't think "table"; it is
going to be a database and/or a complex XML structure anyway.)
Håvard Hjulstad mailto:[log in to unmask]
Chairman ISO/TC37 (Terminology and other language resources)
Convener of ISO/TC37/SC2/WG1 (Language coding)
Acting chairman of ISO 639 RA-JAC
NO-1430 Ås, Norway
tel: +47 64963684
fax: +47 64944233
mob: +47 90145563