Thanks for your comments. Speaking more for myself than the SDT, such 
constraint ought to be absent from the general EAD and EAC schemas, 
leaving it to specific subsets or profiles to define and enforce usage, 
and external groups to maintain codelists. Indeed, in the current form of 
EAD3 all non-empty elements carry unconstrained "lang" and "script" 

The develop branch of the SDT's EAD-Revison GitHub repository contains a 
sample schematron schema which demonstrates restriction of lang attribute 
values to iso639-2 codes. See:

which uses the LC maintained vocabulary available at

It would be easy enough to use LC's iso639-1 list instead.

Thanks again,


On Thu, 14 Nov 2013, Ethan Gruber wrote:

> Hi all,
> There is an issue that has been bothering me for quite some time, and I think a discussion needs to be had in the
> broader community regarding xml:lang.  In EAD and EAC-CPF, the xml:lang attribute is bound to the three-letter ISO
> 639-2 code (e.g., eng for English).  In the larger world wide web, the de facto standard is the two-letter 639-1 code
> (e.g., en).  Most tutorials for the usage of xml:lang in XML documents use the two-letter code, and it is exclusive to
> linked data systems.  While there is a larger number of codes represented by the 639-2 code compared to 639-1 (see
>, I think that the transportability of EAD and EAC-CPF into
> other data systems is hampered by the current three-code requirement in the schema.  I believe that the EAC and EAD
> schema development committees should take this into consideration.
> Just my two cents,
> Ethan

Terry Catapano
Special Collections Analyst/Librarian
Columbia University Libraries Digital Program
[log in to unmask]