On Nov 9, 2011, at 21:39 , Riley, Charles wrote:
> Hi Ivan,
> Character encoding and SKOS mappings might be a good place to start.
> Bibliographic data is largely built on the MARC-8 character set, in essence a subset of UTF-8; thus a loss of data for the preponderance of materials in non-Latin scripts has already occurred by the time data becomes bibliographic. Similarly, ISO 639-2B is more or less a subset of the languages represented in ISO 639-3: languages of literary warrant having passed a threshold of being used in fifty or more texts. MARC language codes in many cases still carry an outdated colonial legacy: uv for Burkina Faso (the former Upper Volta), rh for Zimbabwe (Rhodesia), dm for Benin (Dahomey).
> What are some of the ways you might envision allowing our data to mesh better with that which exists in the rest of the world?
With my Semantic Web/RDF hat on: RDF has made the choice of (1) use Unicode ("The lexical space of a datatype is a set of Unicode [UNICODE] strings.") and, if needed, the language tag (the 2004 version of RDF referred to rfc 3066, the new one in preparation relies on bcp47). I would think that the combination of these two should provide a standard for interchanging data; even if a particular language or script is not present in the current standards, these are evolving, so by sufficient peer pressure (and the library community has the possibility to provide such a pressure) any missing entry could be added, eventually.
I can imagine that, in some cases, you would need a richer description of the language involved, eg, for a human reader. This could be incorporated into an RDF based vocabulary using existing ontologies and URI references. The one I know just a bit is lexvo:
that provides URI-s for languages, as well as some description that can be linked to. For example, if I take Hungarian, it provides a URI for the language:
there is an RDF representation for this URI which links further to a more detailed information (through the http://lexvo.org/ontology#language property) to:
which then includes further information about Hungarian (essentially the various labels of the language in different languages).
The beauty of the Linked Data is that, from the library community point of view, there is no need to repeat this information; you should just link to it and let other people feel the pain of keeping that data up to date...
I hope this answers your question!
P.S. I am not an internationalization expert, but I have a colleague at W3C (Richard Ishida) who is really really knowledgeable in this. Some of the services he provides on his web page may be helpful here (see  or  for language tag and unicode lookup), and his page on the W3C on the language tag may also be interesting...
> Charles Riley
> -----Original Message-----
> From: Bibliographic Framework Transition Initiative Forum [mailto:[log in to unmask]] On Behalf Of Ivan Herman
> Sent: Wednesday, November 09, 2011 12:31 PM
> To: [log in to unmask]
> Subject: [BIBFRAME] Introduction (@W3C)
> As a new member of this mailing list, allow me to introduce myself and the institution I represent.
> I am what we call in our jargon the Semantic Web Activity Lead at the W3C. What this means in practice is that I initiate and coordinate most (if not all) Semantic Web related groups at the W3C and I am also responsible for the outreach activities around the Semantic Web.
> I was very excited to see the initiative of the US Library of Congress. From my point of view, this initiative will be an important contribution to the vision of the Semantic Web or, to use another term, a Web of Data on which library data at large would at last take its well deserved place.
> I will not repeat that arguments on the benefits for the Library Community of using Linked Library Data. This has been documented in a report of a W3C Incubator Group; they have made a much better job that I would ever do. However, I can express why I believe such a synergy would also be beneficial for the Semantic Web community. Indeed, the Semantic Web envisions a Web of Data, i.e., a place where different types of data can be integrated, used by applications or by end users, regardless of the origin and the exact location of that data. The Web has given us this for documents; it is time to have the same for data in general. However, it is inconceivable to envisage this without the huge amount of data, repositories, catalogues, accumulated knowledge, etc, that is available in libraries around the globe. Furthermore, and that may be less obvious to the library community, the unique experience that this community has in cataloguing, archiving, and managing resources can bring a hugely important extra experience and knowledge to the Semantic Web community, research and development alike.
> I am not a librarian. This means that there are many technical and social issues discussed on this list that I cannot really contribute to. However, I would be very pleased to provide feedback, whenever that is necessary, on specific, Semantic Web related technical questions concerning the intricacies of RDF, OWL, SKOS, or SPARQL. I would also be happy to take the problems raised by this group and feed them back to the relevant Working Groups that are currently active at the W3C (see, for example,  for some of those). I.e., I hope I can be of help.
> Of course, there may specific technical issues and solutions coming up in future that might require further standardization in future; W3C may have a role to play then and I will be happy to discuss this if and when the time comes.
> Ivan Herman
>  http://www.loc.gov/marc/transition/news/framework-103111.html
>  http://www.w3.org/blog/SW/2011/10/27/w3c-library-linked-data-xg-final-report-published/
>  http://www.w3.org/2001/sw/
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
Ivan Herman, W3C Semantic Web Activity Lead