Print

Print


At 12:07 PM 3/28/2002 -0500, you wrote:
>After reviewing the XML schema for a while now, I realized that
>the schema makes no provision for xml:lang.  While the encoding
>specifies UTF-8 or UTF-16 and allows you to specify any language
>you wish for content, it's important from a processing standpoint
>to know that the title elements content is in English, French or
>Spanish, rather than having to parse the text of the element and
>guessing the language from the Unicode characters present in the
>content.

This is a subject of long-standing debate in the MARC world, and I usually
illustrate the issue using the name of a cafe near my office: "Pasta
Cuisine." Now, what language is that? Well, you and I know that it's got to
be American because no other culture would be so bloody stupid about
language. But how would you code it? Here's one from the book world: title
= Jean-Paul Marat. The book could be in just about any language, but what's
the language of the title?

At this point, I think it's better to specify the character encoding and
let that guide the processing of the metadata.

*********************************************
Karen Coyle           [log in to unmask]

            http://www.kcoyle.net
**********************************************