Print

Print


>>>>> "KC" == Karen Coyle <[log in to unmask]> writes:

KC> At 10:33 AM 7/17/2003 -0500, you wrote:

KC> You don't need script attributes for Unicode, or so the
KC> documentation says. You can tell what script it is from the code
KC> range. Does anyone know if this really works? And if it works, is
KC> it practical?

That is true, you can identify the script of any character.  That
seems to be largely an organizational principle, but it doesn't really
get at the language.  For example, the Cyrillic characters are in the
range U+0400-U+04FF.  The organization is based on ISO 8859-5, for
ease of conversion to and from Unicode.  But that doesn't tell you
what language the text is written in.  Might be Russian, or Serbian
(IIRC), or one of several other languages.  Similarly, English,
German, French, and Spanish all use Latin scripts.  So if the knowing
the language is an requirement for an application, scripts don't cut
it.

Tod A. Olson <[log in to unmask]>     "How do you know I'm mad?" said Alice.
Sr. Programmer / Analyst            "If you weren't mad, you wouldn't have
The University of Chicago Library    come here," said the Cat.