The old library approach made sense back then. To be honest the library sector was tackling Internationalization issues well before the rest of the IT industry and had to create solutions that worked.
But jnternationalisation has undergone dramatic changes since the late 1990s, and honestly has left the library industry far behind.
This results in differences in terminology and concepts. Paradigm shifts.
For instance an ordered list of identifiers to me comprises multiple components ... indexing and stop words on one hand, collation and sorting on another ... and then string matching for search terms is separate again.
I suspect that bibframe doesn't need to address either collation or string matching ... those would be inherited. Although bibframe may need to identify a preferred normalisation form for data. There are various pros and cons esp for CJKV data.
What would need to be addressed is the indexing side. Whether this should be included in bibframe, in a separate spec or via discussion with the UTC and the CLDR people should be approached to include it in CLDR so it is accessible outside library industry and accessible within core development tools is a matter for debate ... if that makes sense?
What we have currently evolved through an historical context. But the context is very different now. And the industry need to leverage off the advances in internationalisation
On Feb 16, 2013 9:33 AM, "Karen Coyle" <[log in to unmask]> wrote:
On 2/15/13 1:08 PM, Andrew Cunningham wrote:
The more I consider Internationalization and bibframe, the more I realise that adopting RDF places bibframe in a different data ecosystem, inheriting a lot of internationalisation features from the Unicode and W3C sides.
So slabs of stuff like collation may not and should not be part of bibframe ever, but should be addressed in other forums like CLDR
I like the idea that collation, collocation, and order are application issues, not data issues. However, that is quite a leap from previous library practice where the goal of heading creation was precisely creating an ordered list of identifiers. If this non-collocation concept were to be accepted, wouldn't that also set bibframe apart from RDA (which I believe still has quite a bit of textual heading creation in its rules)?
I'm concerned that we seem to be heading in a few different directions, with no clarity as to how those may or may not ever work together. Quite honestly, moving to RDA at a time when we don't even know *when* (and perhaps *if*) we will be able to accommodate it in a machine-readable form  doesn't sound like a great idea.
 And, no, I don't think that coding "RDA in MARC" is anything more than lipstick on a ... well, on a whatever. It seems like a square-peg, round-hole exercise, more pain than gain.
On Feb 16, 2013 12:54 AM, "Tom Emerson" <[log in to unmask]> wrote:
Andrew Cunningham writes:
> * sorting/collation in the Unicode context occurs within either a
> languageless multilingual context, ie DUCET. Which is/has just undergone a
> few very interesting changes, and locale specific collations identified in
> CLDR where one or more collations are defined per locale
The engineering complexity for systems like EBSCOhost and EBSCO Discover
Service is that many collections are multilingual and are sold
internationally. A customer in Sweden will want their data in Swedish
sort order, while another in Egypt will want a tailoring that uses
English with a preference for Arabic. Supporting all these possibilities
in a scalable fashion is a real challenge.
> * matching is more problematic, since it brings in both the need for
> normalisation and matching grapheme clusters. Although ideally for some
> languages these would need to be custom rather than default grapheme
Indeed... internationalized sorting is a very tricky thing to get
right. You're pretty much guaranteed to annoy everyone.
Obligatory BibFrame Tie-in: I think collation is way out of scope for
this project. Obviously filing rules are necessary in any cataloging
system, and will need to be addressed, anything beyond that is not worth
discussing. Adopting RDF has pretty much insured that we are moving to
Unicode (and good riddance to MARC-8).
Principal Software Engineer, Search
[log in to unmask]
The opinions in this message do not necessarily constitute those of
-- Karen Coyle [log in to unmask] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet