In BCP 47 (aka RFC 5646) it is not a question of “preference” in terms of which language code to use. It has been a long-standing policy in the previous RFC’s concerning language coding (RFC4646 and its predecessors) that if using that specification you use the 2-character language code from ISO 639-1 if there is one as the primary subtag. If there is not you use the 3-character one from ISO 639-2. The ISO 639-1 list has essentially been frozen for many years and includes far fewer language codes (because obviously there are fewer combinations when using 2 characters) than ISO 639-2 (or 3).  



(former chair of the ISO 639 Joint Advisory Committee)



From: Bibliographic Framework Transition Initiative Forum [mailto:[log in to unmask]] On Behalf Of Andrew Cunningham
Sent: Monday, June 30, 2014 10:40 PM
To: [log in to unmask]
Subject: Re: [BIBFRAME] Bibframe and Linked Data


Hi Karen


On 1 July 2014 06:51, Karen Coyle <[log in to unmask]> wrote:

Although I also have regularly encountered two-character tags in RDF statements, the RDF concepts document [1] clearly does not preclude the use of 3-character tags or even complex tags like "zh-yue" or "tlh-Kore-AQ-fonipa" (phonetic transcription of Klingon using Korean script :-)).


In BCP-47 terms it should be "yue" rather than "zh-yue"


As for tlh-Kore-AQ-fonipa, you could have a document that is simultaneously using the -Kore and -fonipa subtags


tlh-Latn-AQ-fonipa  or tlh-Kore-AQ but not tlh-Kore-AQ-fonipa


The biggest problem with library data is actually romanisations and the inability to tag romanisation data according to the romanisation scheme being used. For most cases that is



The RDF document states that any valid language tag (referring to the relevant IETF doc, BCP47 [2]) can be used. That IETF document instructs one to tag languages at the level at which the information is useful, but not beyond. That obviously makes good sense. The fact is that there are languages (MANY!) that have no 2-letter code, at which point a three-letter code, or a tag and subtag, must be used. I suspect that the prevalence of two-letter codes has to do with who is providing linked data. Stats, however, show that some three-letter codes are being used. [3]


The key is "valid language tag" by BCP47 definition.


And BCP47 gives a preference for the two letter code, rather than one of the three letter codes.


The tags as you indicate should be short and only indicate what is needed to be indicated. E.g. 

The language tag for arabic, would be "ar" (three letter codes would only be needed to distinguish between colloquial varieties of Arabic, 'ar' tag would be sufficient identifier for Modern Standard Arabic written in the Arabic script)


A language tag for romanised Arabic based on the ALA-LC romanisaation tables as published in 1997 would be ar-Latn-alalc97


It is not possible to construct a language tag for current ALA-LC Arabic romanisation scheme, since there is no appropriate subtag registered. A language tag ar-Latn ... is insufficient since there are many widely different romanisation schemes for Arabic, and the language tag does not have enough specificity









On 6/30/14, 11:41 AM, Simon Spero wrote:

This falls under the general problem of the use of strings instead of IRIs; different forms of code that are associated with the same "language" could be associated with an IRI referring to that "language" .

Alternatively,  two Identifiers could be declared and asserted to be sameAs ,  but that approach is more complicated.

"Language" left unpacked to avoid issues of extended language tags

On Jun 29, 2014 4:26 PM, "Stuart Yeates" <[log in to unmask]> wrote:

On 06/28/2014 01:25 AM, Jody L. DeRidder wrote:

I just saw this posted on Twitter.

Rob Sanderson is concerned about the ways in which Bibframe does NOT
worked in the linked data environment, and is trying to effectively
communicate the issues.  He's asking for feedback:

My biggest issue (that's not covered in the doc, but which I've already fed to the doc's authors) is that BIBFRAME mandates three-letter language codes, where available, while core RDA mandates two-letter language codes, where available.

This requires every app that wants to interoparate BIBFRAME with any thing else (and indeed any app that wants to compare BIBFRAME language codes with the language codes on RDF plain-text labels) to have extensive lookup tables.



Karen Coyle
[log in to unmask]
m: 1-510-435-8234
skype: kcoylenet



Andrew Cunningham
Project Manager, Research and Development (Social and Digital Inclusion)
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: [log in to unmask]