On 9 January 2013 10:17, Simon Spero <[log in to unmask]> wrote:
I agree with Ross that this is not a Bibframe specific issue.  

Both of the pieces of sample code use MARC-XML, which, in order to be standard, must use unicode (see [log in to unmask]" target="_blank">http:[log in to unmask] ).  

I am well aware of this. And I'd suggest NCRs should never be used, i.e. the content should be all Unicode characters.
RDF Literals that are textual can be marked for language/dialect, etc.  Doing so indicates that the text so marked should be interpreted as being in the given language.   This has precisely the same semantics as the  standard "lang" attribute.  Currently, RDF PlainLiterals allow the language tag to be unspecified. 

And depending on the upstream uses of Bibframe data, I'd suggest that language MUST be specified. Not so much an issue with RDF PlainLiterals as it is with Bibframe's use of RDF PlainLiterals.
HTML and xml lang attributes apply only to text and elements enclosed by the element bearing the attribute; this attribute value can be overridden by enclosed elements.

If the language for which a literal is a text is unknown, it need not be stated. 

True, when unknown.


The language of the text that is used to describe something that is, or bears, information in a language may be unrelated to the language of the described object.  For example, a copy of an English language translation of a work originally written in French, held in a Mexican library, might have a description which includes a Spanish language summary.

Yes which is why i made the distinction in an earier post about the distinction of language about the object and the language the object uses.

If a language has a ISO-639-1  two letter code then that code is registered, per BCP-47. 
If a language has both an ISO-639-2 B and an ISO-639-2 T code defined, then only the T code is registered per BCP-47.

Every language that has a two letter code and a three letter code, then only the two letter code is registered. 
Every language that has a distinct B and T 3-letter code also has a two letter code. 

For instance you may have an English record describing an object in Arabic language, with some data about object written in Arabic and some in Romanised Arabic.

The RDF PlainLiterals containing English data should be marked up with an appropriate language tag, e.g. "en"

The RDF PlainLiterals containing Arabic data should be marked up with an appropriate language tag, e.g. "ar" assuming modern standard Arabic.

The RDF PlainLiterals containing Romanised Arabic data should be marked up with an appropriate language tag, e.g. "ar-Latn-alalc97" assuming modern standard Arabic.


Andrew Cunningham
Project Manager, Research and Development
Social and Digital Inclusion Team
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: [log in to unmask]