Print

Print


Hi Karen.

The old library approach made sense back then. To be honest the library
sector was tackling Internationalization issues well before the rest of the
IT industry and had to create solutions that worked.

But jnternationalisation has undergone dramatic changes since the late
1990s, and honestly has left the library industry far behind.

This results in differences in terminology and concepts. Paradigm shifts.

For instance an ordered list of identifiers to me comprises multiple
components ... indexing and stop words on one hand, collation and sorting
on another ... and then string matching for search terms is separate again.

I suspect that bibframe doesn't need to address either collation or string
matching ... those would be inherited. Although bibframe may need to
identify a preferred normalisation form for data. There are various pros
and cons esp for CJKV data.

What would need to be addressed is the indexing side. Whether this should
be included in bibframe, in a separate spec or via discussion with the UTC
and the CLDR people should be approached to include it in CLDR so it is
accessible outside library industry and accessible within core development
tools is a matter for debate ... if that makes sense?

What we have currently evolved through an historical context. But the
context is very different now. And the industry need to leverage off the
advances in internationalisation

A.
 On Feb 16, 2013 9:33 AM, "Karen Coyle" <[log in to unmask]> wrote:

>
> On 2/15/13 1:08 PM, Andrew Cunningham wrote:
>
> The more I consider Internationalization and bibframe, the more I realise
> that adopting RDF places bibframe in a different data ecosystem, inheriting
> a lot of internationalisation features from the Unicode and W3C sides.
>
> So slabs of stuff like collation may not and should not be part of
> bibframe ever, but should be addressed in other forums like CLDR
>
>
> I like the idea that collation, collocation, and order are application
> issues, not data issues. However, that is quite a leap from previous
> library practice where the goal of heading creation was precisely creating
> an ordered list of identifiers. If this non-collocation concept were to be
> accepted, wouldn't that also set bibframe apart from RDA (which I believe
> still has quite a bit of textual heading creation in its rules)?
>
> I'm concerned that we seem to be heading in a few different directions,
> with no clarity as to how those may or may not ever work together. Quite
> honestly, moving to RDA at a time when we don't even know *when* (and
> perhaps *if*) we will be able to accommodate it in a machine-readable form
> [1] doesn't sound like a great idea.
>
> kc
> [1] And, no, I don't think that coding "RDA in MARC" is anything more than
> lipstick on a ... well, on a whatever. It seems like a square-peg,
> round-hole exercise, more pain than gain.
>
>
>  On Feb 16, 2013 12:54 AM, "Tom Emerson" <[log in to unmask]> wrote:
>
>> Andrew Cunningham writes:
>> [...]
>> > * sorting/collation in the Unicode context occurs within either a
>> > languageless multilingual context, ie DUCET. Which is/has just
>> undergone a
>> > few very interesting changes, and locale specific collations identified
>> in
>> > CLDR where one or more collations are defined per locale
>>
>> The engineering complexity for systems like EBSCOhost and EBSCO Discover
>> Service is that many collections are multilingual and are sold
>> internationally. A customer in Sweden will want their data in Swedish
>> sort order, while another in Egypt will want a tailoring that uses
>> English with a preference for Arabic. Supporting all these possibilities
>> in a scalable fashion is a real challenge.
>>
>> > * matching is more problematic, since it brings in both the need for
>> > normalisation and matching grapheme clusters. Although ideally for some
>> > languages these would need to be custom rather than default grapheme
>> > clusters.
>>
>> Indeed... internationalized sorting is a very tricky thing to get
>> right. You're pretty much guaranteed to annoy everyone.
>>
>> Obligatory BibFrame Tie-in: I think collation is way out of scope for
>> this project. Obviously filing rules are necessary in any cataloging
>> system, and will need to be addressed, anything beyond that is not worth
>> discussing. Adopting RDF has pretty much insured that we are moving to
>> Unicode (and good riddance to MARC-8).
>>
>>     -tree
>>
>> --
>> Tom Emerson
>> Principal Software Engineer, Search
>> EBSCO Publishing
>> [log in to unmask]
>>
>> The opinions in this message do not necessarily constitute those of
>> EBSCO Publishing.
>>
>
> --
> Karen [log in to unmask] http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
>
>