On 9 January 2013 10:17, Simon Spero <[log in to unmask]> wrote:

> I agree with Ross that this is not a Bibframe specific issue.
> Both of the pieces of sample code use MARC-XML, which, in order to be
> standard, must use unicode (see
> http:[log in to unmask] ).
> I am well aware of this. And I'd suggest NCRs should never be used, i.e.
the content should be all Unicode characters.

> RDF Literals that are textual can be marked for language/dialect, etc.
>  Doing so indicates that the text so marked should be interpreted as being
> in the given language.   This has precisely the same semantics as the
>  standard "lang" attribute.  Currently, RDF PlainLiterals allow the
> language tag to be unspecified.
And depending on the upstream uses of Bibframe data, I'd suggest that
language MUST be specified. Not so much an issue with RDF PlainLiterals as
it is with Bibframe's use of RDF PlainLiterals.

> HTML and xml lang attributes apply only to text and elements enclosed by
> the element bearing the attribute; this attribute value can be overridden
> by enclosed elements.
> If the language for which a literal is a text is unknown, it need not be
> stated.
> True, when unknown.

> ---
> The language of the text that is used to describe something that is, or
> bears, information in a language may be unrelated to the language of the
> described object.  For example, a copy of an English language translation
> of a work originally written in French, held in a Mexican library, might
> have a description which includes a Spanish language summary.
> ---
Yes which is why i made the distinction in an earier post about the
distinction of language about the object and the language the object uses.

> If a language has a ISO-639-1  two letter code then that code is
> registered, per BCP-47.
> If a language has both an ISO-639-2 B and an ISO-639-2 T code defined,
> then only the T code is registered per BCP-47.
> Every language that has a two letter code and a three letter code, then
> only the two letter code is registered.
> Every language that has a distinct B and T 3-letter code also has a two
> letter code.

For instance you may have an English record describing an object in Arabic
language, with some data about object written in Arabic and some in
Romanised Arabic.

The RDF PlainLiterals containing English data should be marked up with an
appropriate language tag, e.g. "en"

The RDF PlainLiterals containing Arabic data should be marked up with an
appropriate language tag, e.g. "ar" assuming modern standard Arabic.

The RDF PlainLiterals containing Romanised Arabic data should be marked up
with an appropriate language tag, e.g. "ar-Latn-alalc97" assuming modern
standard Arabic.


Andrew Cunningham
Project Manager, Research and Development
Social and Digital Inclusion Team
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: [log in to unmask]