The more I consider Internationalization and bibframe, the more I realise that adopting RDF places bibframe in a different data ecosystem, inheriting a lot of internationalisation features from the Unicode and W3C sides.

So slabs of stuff like collation may not and should not be part of bibframe ever, but should be addressed in other forums like CLDR

On Feb 16, 2013 12:54 AM, "Tom Emerson" <[log in to unmask]> wrote:
Andrew Cunningham writes:
> * sorting/collation in the Unicode context occurs within either a
> languageless multilingual context, ie DUCET. Which is/has just undergone a
> few very interesting changes, and locale specific collations identified in
> CLDR where one or more collations are defined per locale

The engineering complexity for systems like EBSCOhost and EBSCO Discover
Service is that many collections are multilingual and are sold
internationally. A customer in Sweden will want their data in Swedish
sort order, while another in Egypt will want a tailoring that uses
English with a preference for Arabic. Supporting all these possibilities
in a scalable fashion is a real challenge.

> * matching is more problematic, since it brings in both the need for
> normalisation and matching grapheme clusters. Although ideally for some
> languages these would need to be custom rather than default grapheme
> clusters.

Indeed... internationalized sorting is a very tricky thing to get
right. You're pretty much guaranteed to annoy everyone.

Obligatory BibFrame Tie-in: I think collation is way out of scope for
this project. Obviously filing rules are necessary in any cataloging
system, and will need to be addressed, anything beyond that is not worth
discussing. Adopting RDF has pretty much insured that we are moving to
Unicode (and good riddance to MARC-8).


Tom Emerson
Principal Software Engineer, Search
EBSCO Publishing
[log in to unmask]

The opinions in this message do not necessarily constitute those of
EBSCO Publishing.