Print

Print


> Obviously Ethnologue is a fine
> source for the information, but doesn't quite map the 3166 to 639-2
codes.

If you go to http://www.ethnologue.com/codes/, you'll find some
downloadable files with data from  the 14th edn of Ethnologue.

LanguageCodes.tab lists each entry along with the *primary* country with
which the language is associated.

LanguageIndex.tab has a row for each language x country -- i.e. for a
given language, there's a row for each country in which it is listed in
E14 as being spoken.

In these files, the countries are indicated using ISO 3166 identifiers.
The only issues, then, are 

- knowing how to associate entries in E14 with entries in ISO 639
- filtering out the languages that are of interest
- determining whether the basis on which E14 associated a given language
with a particular country fits your purposes.

The last one isn't trivial: 

- Certainly we would say that Spanish is spoken in the US.

- There are significant communities of Vietnamese speakers living in the
US; for a given purpose, do you list USA as a country in which
Vietnamese is spoken?

- There are undoubtedly speakers of (e.g.) Marathi living in the US,
though there may not be any significant community of Marathi speakers;
for a given purpose, do you list USA as a country in which Marathi is
spoken?



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division