Hi Karen!

Karen Coyle kirjoitti 19.12.2017 klo 16:32:

> Do browsers have markup within the page for this? I'm aware of the head
> language encoding, but not of individual elements in the page. I guess
> that CSS may add something - I'm not up with everything you can do
> today. 

The lang attribute can be used within a HTML page to specify language of 
the content within an element:

<p lang="fr">...</p>
<p lang="en">...</p>

Nowadays it can be used on any HTML element; historically it has been 
forbidden for a few elements where it would make little sense, such as 
<br> and <hr>.

This is very useful when there are content in different languages on the 
same web page. For example text-to-speech systems can use it, and also 
it helps to select the correct fonts (e.g. traditional Chinese) as was 
already mentioned in this thread.

> However, there is a big problem with trying to attribute
> *language* to fields in bibliographic data. It only takes a few examples
> to understand why:
> Title:
> 1984 (book in German)
> 1984 (book in Hebrew)
> 1984 (book in English)

I don't think that's a problem at all. In fact this is a great example, 
since the name of Orwell's novel (assuming you meant it) actually 
differs between many languages. According to Wikidata 
( it is called

"1984" in German
"1984" in Hebrew (but rendered with right-to-left alignment!)
"Nineteen Eighty-Four" in English (not 1984!)
"Vuonna 1984" in Finnish
"নাইন্টিন এইটি-ফোর" in Bengali


So although the title happens to be a number (more specifically a year), 
there are actually variations between how it is expressed in different 
languages. Even the original English title is not just a plain number 
but spells it out using numerals. Without language tags it would be 
difficult for a program (e.g. a web browser) to display the title with 
the correct font and alignment relative to surrounding text, or a screen 
reader to speak it properly.

> Title:
> Marie Antoinette (book in English)
> Marie Antoinette (book in Swedish)

Here a language tag does no harm, and may be useful in case of e.g. 
transliteration. I think the best way to think about language tags in 
cases like this is "this is how this thing is called in the context of 
this language", not necessarily that it originally is an expression of 
that language. Is "déjà vu" English when used within an English language 
sentence such as "approximately two-thirds of the population have had 
déjà vu experiences"? I'd say it is, or close enough that the language 
tag for English can be used. Same applies for titles that have been 
borrowed from other languages.

> Author:
> Wong, Mario (a real name, altho not an author)

Names are a bit problematic, but again, language tags are useful for 
e.g. font selection and text-to-speech systems. The same contextual 
interpretation of language tags applies as above - if this variant of 
the name can be used in the context of a specific language, then it can 
also be tagged with that language tag.

"Xi Jinping"@en
"Xí Jìnpíng"@zh-Latn-pinyin

Without language tags (including the script and other variant 
information when necessary), it would be difficult to keep track of the 
different ways a name can be spelled. The language tag doesn't 
necessarily indicate that the name is originally from that language / 
culture (often a futile thing to attempt anyway), only that it is used 
in the context of that language.

> If special exceptions are need for the unified ideograms, then I see
> that as an exception that affects display, not a general declaration of
> the language of strings.

Respectfully disagree, per above.


Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
Tel. +358 50 3199529
[log in to unmask]