Just a few thoughts.

* BCP 47 with extensions developed for CLDR would provide adequate indication of preferred collation.

* sorting/collation in the Unicode context occurs within either a languageless multilingual context, ie DUCET. Which is/has just undergone a few very interesting changes, and locale specific collations identified in CLDR where one or more collations are defined per locale

* matching is more problematic, since it brings in both the need for normalisation and matching grapheme clusters. Although ideally for some languages these would need to be custom rather than default grapheme clusters.

On Feb 13, 2013 2:44 AM, "Jörg Prante" <[log in to unmask]> wrote:

the question about filing indicators is an interesting question also for software engineering.

I assume Bibframe, as a successor of all the MARC format families, should be able to carry library catalog data of many bibliographic rules (for example, data from german cataloging and from german filing/ordering rules). Such a semantic layer over Bibframe data is important because it is separate from the original "raw" data. Filing/sorting rules are also dependent on the language and localization environments of the cataloging rules. And, what is often ignored, they change over time.

Speaking from the viewpoint of a software engineer, sooner or later in the need to serve Bibframe data to the user in a consistent manner, filing/sorting rules do always cover a collection-wide scope of documents, not only a single document. In other words, there is a document context, which fits perfectly to Linked Data. In MARC, there were only records, with a static, context-free model how to control the data in the record. Librarians worked around this limitation by adding variant text fields to original data text fields, using helper characters to express sorting/filing rules. This procedure is unfortunately from the age of punchcards and should be reconsidered carefully for the Linked Data environment.

The old procedure is not a preferable solution for Bibframe because

- the filing/sorting variants should no longer be required for being entered manually in a repetitive fashion, they should no longer be erroneous or incomplete

- not every Bibframe package will come with all the variant texts needed for filing/sorting a document collection, raising the question what is taking precedence in case of conflicting or missing variants

- not every sorting/filing rule of all international contexts can be included, and if it could, there must be a method to distinguish between them all. It's also raising the question how Bibframe data should be merged when there are different filing/ordering rules for the same text.

- and, maybe most important, there are other mechanisms for expressing filing/sorting rules that software engineers have invented since when filing/sorting indicators for MARC have been introduced ;-)

I would like to extend the statements made in "Assessment of Options for Handling Full Unicode in Character Encodings in MARC 21" http://www.loc.gov/marc/marbi/2005/2005-report01.pdf

For example, there is a suggestion "The bibliographic community needs to examine the Unicode components of normalization and collation and consider whether they can be adopted across scripts."

In contrast to the "Assessment of Options for Handling Full Unicode in Character Encodings in MARC 21" where the functions "Indexing/Searching, Sorting, Record matching" (p. 7) are subsumed and assigned to the reponsibility of an individual institution, I think Bibframe should define at least a common sense of how to embrace Unicode sorting rules.

My suggestions in the context of Bibframe are:

- Bibframe should enable codes for filing/sorting rules. The Unicode consortium has made great efforts on dealing with a plethora of collation rules (either by collation keys or by rule based collations). See also http://cldr.unicode.org/ and http://cldr.unicode.org/index/cldr-spec/collation-guidelines for how to generate new collation rules.

- Bibframe should provide links to the collation rule information from the text the cataloger wants to describe. It does not help much to add language information, sorting/nonsorting variants and other localization information at other places in the bibliographic description. For example, in RDF, literals can be encoded with a language tag, directly attached to the text. For Bibframe, special library catalog rule context tags could be appropriate, if language tags are not.

- Bibframe should add internationalization also to filing/sorting rules

- Bibframe should oblige to apply a default Unicode-based procedure to filing/sorting texts if there is missing or conflicting information about internationalization

- computer systems that export/import Bibframe data should be able to apply filing/sorting rules automatically, recognizing the source and the target environment of the Bibframe transport

The results of the Unicode consortium are also immediately available for software programming languages, thanks to projects like ICU http://site.icu-project.org/

For example, there is a Unicode Collation Algorithm (UCA) that could be applied to combined bibliographic data originating from many international sources. Or, if that's not sufficient, another Unicode-based collation algorithm could be developed for Bibframe.

Just as there are authority data sources for controlled vocabulary in library catalogs, there should be freely available authoritative resources for the filing/sorting rules that should apply to Bibframe texts in locally defined contexts and environments. My hope is, in the near future, library catalog users and software engineers, who are used to applications that use Unicode, will no longer get frustrated about library catalog data and the many methods of expressing filing/sorting.

Best regards,


Am 11.02.13 02:36, schrieb J. McRee Elrod:
I've noticed no discussion on Bibframe of filing indicators, nor
indication of such in posted examples.  Did I miss it?

There was a recent discussion on another list of titles which should
file under what appears to be an initial article, e.g, "A is for ...".

How will this be handled in Bibframe?  Initial articles differ between
languages, as well as "A", "An" and "The" being occasionally the word
by which to file.   Programming to recognize this would be very
complex.  I have seen no discussion concerning indication of language
in Bibframe, on which to base such programming.

Are we to no longer have alphabetical browse lists, only web style
searching?  I would miss alphabetical browse lists apart from subject
searches, which I prefer to have in inverse chronological order.  That
too would be more difficult to program based on imprint date in the
absence of a date fixed field.  The imprint date may even be lacking,
if that CONSER provision is carried over.

    __       __   J. McRee (Mac) Elrod ([log in to unmask])
   {__  |   /     Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
   ___} |__ \__________________________________________________________