At 10:33 AM 7/17/2003 -0500, you wrote:
>But if we are going to rely on unicode alone, then we might as well drop
>the lang, xml:lang and script attributes completely!
Unfortunately - or otherwise - you are talking to the person (moi) who
argued against xml:lang in descriptive fields (which are those that are
copied from the piece, such as author, title, publisher, etc.). What would
you do with "Italian Cuisine" or "The Tao of Pooh"? What about a book title
like: "Siddhartha"? I don't think we want folks to have to determine if a
word that originates in another language is or isn't now considered part of
English. And I also don't think we can expect people to make these
distinctions for works in languages other than their own. Do you exclude
proper nouns? Can you even positively determine what is a proper noun?
Sometimes this is easy:
Andy Warhol : Ausstellung der Deutschen Gesellschaft für Bildende Kunst
Sometimes less so:
On the effects of gypsum, or plaster of paris, as a manure;
Language distinctions make sense in some areas, like in subject headings
when there are subject heading schemes in different languages. In that case
you need the language coding or some other coding that translates to a
language, i.e. the subject heading scheme of the Bibliotheque nationale de
France will be assumed to be in French. Using this you can ask your user
what language they wish to search in and you can run their searches only
against that set of headings, so that the English "the" and the French
"the'" are not confused.
You don't need script attributes for Unicode, or so the documentation says.
You can tell what script it is from the code range. Does anyone know if
this really works? And if it works, is it practical?
kc
|