Just a few thoughts.

* BCP 47 with extensions developed for CLDR would provide adequate
indication of preferred collation.

* sorting/collation in the Unicode context occurs within either a
languageless multilingual context, ie DUCET. Which is/has just undergone a
few very interesting changes, and locale specific collations identified in
CLDR where one or more collations are defined per locale

* matching is more problematic, since it brings in both the need for
normalisation and matching grapheme clusters. Although ideally for some
languages these would need to be custom rather than default grapheme
On Feb 13, 2013 2:44 AM, "Jörg Prante" <[log in to unmask]> wrote:

> Hi,
> the question about filing indicators is an interesting question also for
> software engineering.
> I assume Bibframe, as a successor of all the MARC format families, should
> be able to carry library catalog data of many bibliographic rules (for
> example, data from german cataloging and from german filing/ordering
> rules). Such a semantic layer over Bibframe data is important because it is
> separate from the original "raw" data. Filing/sorting rules are also
> dependent on the language and localization environments of the cataloging
> rules. And, what is often ignored, they change over time.
> Speaking from the viewpoint of a software engineer, sooner or later in the
> need to serve Bibframe data to the user in a consistent manner,
> filing/sorting rules do always cover a collection-wide scope of documents,
> not only a single document. In other words, there is a document context,
> which fits perfectly to Linked Data. In MARC, there were only records, with
> a static, context-free model how to control the data in the record.
> Librarians worked around this limitation by adding variant text fields to
> original data text fields, using helper characters to express
> sorting/filing rules. This procedure is unfortunately from the age of
> punchcards and should be reconsidered carefully for the Linked Data
> environment.
> The old procedure is not a preferable solution for Bibframe because
> - the filing/sorting variants should no longer be required for being
> entered manually in a repetitive fashion, they should no longer be
> erroneous or incomplete
> - not every Bibframe package will come with all the variant texts needed
> for filing/sorting a document collection, raising the question what is
> taking precedence in case of conflicting or missing variants
> - not every sorting/filing rule of all international contexts can be
> included, and if it could, there must be a method to distinguish between
> them all. It's also raising the question how Bibframe data should be merged
> when there are different filing/ordering rules for the same text.
> - and, maybe most important, there are other mechanisms for expressing
> filing/sorting rules that software engineers have invented since when
> filing/sorting indicators for MARC have been introduced ;-)
> I would like to extend the statements made in "Assessment of Options for
> Handling Full Unicode in Character Encodings in MARC 21"
> For example, there is a suggestion "The bibliographic community needs to
> examine the Unicode components of normalization and collation and consider
> whether they can be adopted across scripts."
> In contrast to the "Assessment of Options for Handling Full Unicode in
> Character Encodings in MARC 21" where the functions "Indexing/Searching,
> Sorting, Record matching" (p. 7) are subsumed and assigned to the
> reponsibility of an individual institution, I think Bibframe should define
> at least a common sense of how to embrace Unicode sorting rules.
> My suggestions in the context of Bibframe are:
> - Bibframe should enable codes for filing/sorting rules. The Unicode
> consortium has made great efforts on dealing with a plethora of collation
> rules (either by collation keys or by rule based collations). See also
> and**
> cldr-spec/collation-guidelines<>for how to generate new collation rules.
> - Bibframe should provide links to the collation rule information from the
> text the cataloger wants to describe. It does not help much to add language
> information, sorting/nonsorting variants and other localization information
> at other places in the bibliographic description. For example, in RDF,
> literals can be encoded with a language tag, directly attached to the text.
> For Bibframe, special library catalog rule context tags could be
> appropriate, if language tags are not.
> - Bibframe should add internationalization also to filing/sorting rules
> - Bibframe should oblige to apply a default Unicode-based procedure to
> filing/sorting texts if there is missing or conflicting information about
> internationalization
> - computer systems that export/import Bibframe data should be able to
> apply filing/sorting rules automatically, recognizing the source and the
> target environment of the Bibframe transport
> The results of the Unicode consortium are also immediately available for
> software programming languages, thanks to projects like ICU
> For example, there is a Unicode Collation Algorithm (UCA) that could be
> applied to combined bibliographic data originating from many international
> sources. Or, if that's not sufficient, another Unicode-based collation
> algorithm could be developed for Bibframe.
> Just as there are authority data sources for controlled vocabulary in
> library catalogs, there should be freely available authoritative resources
> for the filing/sorting rules that should apply to Bibframe texts in locally
> defined contexts and environments. My hope is, in the near future, library
> catalog users and software engineers, who are used to applications that use
> Unicode, will no longer get frustrated about library catalog data and the
> many methods of expressing filing/sorting.
> Best regards,
> Jörg
> Am 11.02.13 02:36, schrieb J. McRee Elrod:
>> I've noticed no discussion on Bibframe of filing indicators, nor
>> indication of such in posted examples.  Did I miss it?
>> There was a recent discussion on another list of titles which should
>> file under what appears to be an initial article, e.g, "A is for ...".
>> How will this be handled in Bibframe?  Initial articles differ between
>> languages, as well as "A", "An" and "The" being occasionally the word
>> by which to file.   Programming to recognize this would be very
>> complex.  I have seen no discussion concerning indication of language
>> in Bibframe, on which to base such programming.
>> Are we to no longer have alphabetical browse lists, only web style
>> searching?  I would miss alphabetical browse lists apart from subject
>> searches, which I prefer to have in inverse chronological order.  That
>> too would be more difficult to program based on imprint date in the
>> absence of a date fixed field.  The imprint date may even be lacking,
>> if that CONSER provision is carried over.
>>     __       __   J. McRee (Mac) Elrod ([log in to unmask])
>>    {__  |   /     Special Libraries Cataloguing   HTTP://
>>    ___} |__ \_____________________________**_____________________________