There was much discussion on the list about being able to designate
language, script and transliteration for MODS elements, not just at the
record level. As we are incorporating these changes, I find that language
is a difficult one for various reasons and would like opinions.
The MARC language codes have been used in library cataloging since 1968.
The 3-character (bibliographic) language code based on MARC language codes
became part 2 of ISO 639 in 1998. This is an official ISO standard.
We all know that the Internet world has specified use of the ISO
2-character code. RFC 1766 specified this but was revised recently as
RFC3066 to incorporate use of the 3-character code where a 2-character one
does not exist. (ISO 639-1, the 2-character code list, only has some 170+
languages defined while ISO 639-2 has 450+ languages, so the latter is a
much more granular list.) xml:lang references RFC3066 (in an erratum), so
that means that you could see 2 or 3 character codes using the spec.
All of the 2-character codes in ISO 639-1 have equivalent 3-character
codes, and these are to be considered synonyms. (If you want more
information about the relationship between these lists see:)
The question is what to allow for in MODS. In the language element itself,
there is an authority attribute that specifies whether the language code
is from RFC3066 or ISO 639-2. Certainly MARC records would use the ISO
If designating language at the element level (which is not currently in
MARC, although a mechanism to do this has been discussed) what should be
allowed? Options are:
1. define xml:lang (to include what is specified in RFC3066) and lang (to
allow for the 639-2 code) for each element
2. define only lang and let the application decide which code to use
(there wouldn't be any clashes since there is a one-to-one mapping)
3. define only xml:lang and not allow for the 639-2 code at the element
The problem I see in only defining xml:lang is that the 3-character code
is better known in the library world. If converting from MARC records,
the 3-character code would be in the language field; would it then be
strange to use the 2-character one at the element level? Also, are there
other options I haven't thought of?
^^ Rebecca S. Guenther ^^
^^ Senior Networking and Standards Specialist ^^
^^ Network Development and MARC Standards Office ^^
^^ 1st and Independence Ave. SE ^^
^^ Library of Congress ^^
^^ Washington, DC 20540-4402 ^^
^^ (202) 707-5092 (voice) (202) 707-0115 (FAX) ^^
^^ [log in to unmask] ^^