Andrew Cunningham writes:
> * sorting/collation in the Unicode context occurs within either a
> languageless multilingual context, ie DUCET. Which is/has just undergone a
> few very interesting changes, and locale specific collations identified in
> CLDR where one or more collations are defined per locale
The engineering complexity for systems like EBSCOhost and EBSCO Discover
Service is that many collections are multilingual and are sold
internationally. A customer in Sweden will want their data in Swedish
sort order, while another in Egypt will want a tailoring that uses
English with a preference for Arabic. Supporting all these possibilities
in a scalable fashion is a real challenge.
> * matching is more problematic, since it brings in both the need for
> normalisation and matching grapheme clusters. Although ideally for some
> languages these would need to be custom rather than default grapheme
Indeed... internationalized sorting is a very tricky thing to get
right. You're pretty much guaranteed to annoy everyone.
Obligatory BibFrame Tie-in: I think collation is way out of scope for
this project. Obviously filing rules are necessary in any cataloging
system, and will need to be addressed, anything beyond that is not worth
discussing. Adopting RDF has pretty much insured that we are moving to
Unicode (and good riddance to MARC-8).
Principal Software Engineer, Search
[log in to unmask]
The opinions in this message do not necessarily constitute those of