LISTSERV mailing list manager LISTSERV 16.0

Help for BIBFRAME Archives


BIBFRAME Archives

BIBFRAME Archives


BIBFRAME@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

BIBFRAME Home

BIBFRAME Home

BIBFRAME  February 2013

BIBFRAME February 2013

Subject:

Re: Unicode collation for Bibframe (Re: Filing indicators)

From:

"Heuvelmann, Reinhold" <[log in to unmask]>

Reply-To:

Bibliographic Framework Transition Initiative Forum <[log in to unmask]>

Date:

Wed, 13 Feb 2013 12:31:19 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (1 lines)

Hi Jörg,

in addition to the valuable information you gave, I'd like to point to the norm ISO/IEC 14651, "International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering", which is the ISO/IEC standard equivalent to and subset of the Unicode Collation Algorithm (UCA).  There is the basic norm, and an Amendment 1 (both in English and French) available via

http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html  .

These norms are providing a framework _and_ detailed lists of data for string ordering and comparison.  Collation is depending on language and culture.  In Germany, we have defined a delta for the list, as there was need for some German specific ordering (yes, the Umlauts were among these), resulting in an updated version of DIN 5007-1. There is no online version of DIN norms.

From my point of view, these norms handle collation / sorting / ordering / comparison _mainly_ on the level of single characters, or combinations of single characters.  Higher level standards are then providing rules for whole strings, e.g. Library of Congress Filing Rules, ALA Filing Rules, DIN 5007-2, and many more.  And maybe Bibframe.

Best wishes

Reinhold

-- 

Reinhold Heuvelmann
German National Library
IT / Office for Data Formats
Adickesallee 1
D-60322 Frankfurt am Main
Germany
Telephone: +49-69-1525-1709
Telefax: +49-69-1525-1799
mailto:[log in to unmask]
http://www.dnb.de

*** Reading. Listening. Understanding. German National Library ***


-----Ursprüngliche Nachricht-----
Von: Bibliographic Framework Transition Initiative Forum [mailto:[log in to unmask]] Im Auftrag von Jörg Prante
Gesendet: Dienstag, 12. Februar 2013 16:43
An: [log in to unmask]
Betreff: [BIBFRAME] Unicode collation for Bibframe (Re: Filing indicators)

Hi,

the question about filing indicators is an interesting question also for 
software engineering.

I assume Bibframe, as a successor of all the MARC format families, 
should be able to carry library catalog data of many bibliographic rules 
(for example, data from german cataloging and from german 
filing/ordering rules). Such a semantic layer over Bibframe data is 
important because it is separate from the original "raw" data. 
Filing/sorting rules are also dependent on the language and localization 
environments of the cataloging rules. And, what is often ignored, they 
change over time.

Speaking from the viewpoint of a software engineer, sooner or later in 
the need to serve Bibframe data to the user in a consistent manner, 
filing/sorting rules do always cover a collection-wide scope of 
documents, not only a single document. In other words, there is a 
document context, which fits perfectly to Linked Data. In MARC, there 
were only records, with a static, context-free model how to control the 
data in the record. Librarians worked around this limitation by adding 
variant text fields to original data text fields, using helper 
characters to express sorting/filing rules. This procedure is 
unfortunately from the age of punchcards and should be reconsidered 
carefully for the Linked Data environment.

The old procedure is not a preferable solution for Bibframe because

- the filing/sorting variants should no longer be required for being 
entered manually in a repetitive fashion, they should no longer be 
erroneous or incomplete

- not every Bibframe package will come with all the variant texts needed 
for filing/sorting a document collection, raising the question what is 
taking precedence in case of conflicting or missing variants

- not every sorting/filing rule of all international contexts can be 
included, and if it could, there must be a method to distinguish between 
them all. It's also raising the question how Bibframe data should be 
merged when there are different filing/ordering rules for the same text.

- and, maybe most important, there are other mechanisms for expressing 
filing/sorting rules that software engineers have invented since when 
filing/sorting indicators for MARC have been introduced ;-)

I would like to extend the statements made in "Assessment of Options for 
Handling Full Unicode in Character Encodings in MARC 21" 
http://www.loc.gov/marc/marbi/2005/2005-report01.pdf

For example, there is a suggestion "The bibliographic community needs to 
examine the Unicode components of normalization and collation and 
consider whether they can be adopted across scripts."

In contrast to the "Assessment of Options for Handling Full Unicode in 
Character Encodings in MARC 21" where the functions "Indexing/Searching, 
Sorting, Record matching" (p. 7) are subsumed and assigned to the 
reponsibility of an individual institution, I think Bibframe should 
define at least a common sense of how to embrace Unicode sorting rules.

My suggestions in the context of Bibframe are:

- Bibframe should enable codes for filing/sorting rules. The Unicode 
consortium has made great efforts on dealing with a plethora of 
collation rules (either by collation keys or by rule based collations). 
See also http://cldr.unicode.org/ and 
http://cldr.unicode.org/index/cldr-spec/collation-guidelines for how to 
generate new collation rules.

- Bibframe should provide links to the collation rule information from 
the text the cataloger wants to describe. It does not help much to add 
language information, sorting/nonsorting variants and other localization 
information at other places in the bibliographic description. For 
example, in RDF, literals can be encoded with a language tag, directly 
attached to the text. For Bibframe, special library catalog rule context 
tags could be appropriate, if language tags are not.

- Bibframe should add internationalization also to filing/sorting rules

- Bibframe should oblige to apply a default Unicode-based procedure to 
filing/sorting texts if there is missing or conflicting information 
about internationalization

- computer systems that export/import Bibframe data should be able to 
apply filing/sorting rules automatically, recognizing the source and the 
target environment of the Bibframe transport

The results of the Unicode consortium are also immediately available for 
software programming languages, thanks to projects like ICU 
http://site.icu-project.org/

For example, there is a Unicode Collation Algorithm (UCA) that could be 
applied to combined bibliographic data originating from many 
international sources. Or, if that's not sufficient, another 
Unicode-based collation algorithm could be developed for Bibframe.

Just as there are authority data sources for controlled vocabulary in 
library catalogs, there should be freely available authoritative 
resources for the filing/sorting rules that should apply to Bibframe 
texts in locally defined contexts and environments. My hope is, in the 
near future, library catalog users and software engineers, who are used 
to applications that use Unicode, will no longer get frustrated about 
library catalog data and the many methods of expressing filing/sorting.

Best regards,

Jörg

Am 11.02.13 02:36, schrieb J. McRee Elrod:
> I've noticed no discussion on Bibframe of filing indicators, nor
> indication of such in posted examples.  Did I miss it?
>
> There was a recent discussion on another list of titles which should
> file under what appears to be an initial article, e.g, "A is for ...".
>
> How will this be handled in Bibframe?  Initial articles differ between
> languages, as well as "A", "An" and "The" being occasionally the word
> by which to file.   Programming to recognize this would be very
> complex.  I have seen no discussion concerning indication of language
> in Bibframe, on which to base such programming.
>
> Are we to no longer have alphabetical browse lists, only web style
> searching?  I would miss alphabetical browse lists apart from subject
> searches, which I prefer to have in inverse chronological order.  That
> too would be more difficult to program based on imprint date in the
> absence of a date fixed field.  The imprint date may even be lacking,
> if that CONSER provision is carried over.
>
>
>     __       __   J. McRee (Mac) Elrod ([log in to unmask])
>    {__  |   /     Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
>    ___} |__ \__________________________________________________________

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
July 2011
June 2011

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager