> Obviously Ethnologue is a fine
> source for the information, but doesn't quite map the 3166 to 639-2
codes.
If you go to http://www.ethnologue.com/codes/, you'll find some
downloadable files with data from the 14th edn of Ethnologue.
LanguageCodes.tab lists each entry along with the *primary* country with
which the language is associated.
LanguageIndex.tab has a row for each language x country -- i.e. for a
given language, there's a row for each country in which it is listed in
E14 as being spoken.
In these files, the countries are indicated using ISO 3166 identifiers.
The only issues, then, are
- knowing how to associate entries in E14 with entries in ISO 639
- filtering out the languages that are of interest
- determining whether the basis on which E14 associated a given language
with a particular country fits your purposes.
The last one isn't trivial:
- Certainly we would say that Spanish is spoken in the US.
- There are significant communities of Vietnamese speakers living in the
US; for a given purpose, do you list USA as a country in which
Vietnamese is spoken?
- There are undoubtedly speakers of (e.g.) Marathi living in the US,
though there may not be any significant community of Marathi speakers;
for a given purpose, do you list USA as a country in which Marathi is
spoken?
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
|