Dear all,
Thank you for this statement. It contains very good recommendations on
internationalization. I'm especially glad about the recommendation to
use BCP 47 language tags, which are by now used by pretty much all other
institutions except libraries.
A few comments on the text:
> Following BCP47 will allow for the greatest possible interoperability with other data on the web. Language tags should be used in lower case.
The latter sentence is slightly at odds with BCP 47 (section 2.1.1),
which states that language tags are to be interpreted in a case
insensitive way but recommends certain conventions, for example "en-US"
and "az-Arab-IR". I think it should be dropped or clarified.
> A decision should be made on how to handle the byte order mark (BOM) in BIBFRAME: whether to require it, and how to use it.
This strikes me as redundant, since the Unicode BOM is a syntax-level
feature that is generally relevant only for plain text files where the
character encoding may be unclear. BIBFRAME is an RDF-based standard so
the syntax for representing it is defined separately for the various RDF
serializations, including RDF/XML, Turtle, N-Triples, JSON-LD etc. In
general the BOM not needed for any of these since they all have their
own way of indicating the character encoding without relying on the BOM.
Most simply mandate the use of UTF-8 (without a BOM) but RDF/XML can be
used with other character encodings as well.
> BIBFRAME implementers should consider using Unicode Normalization Form C, following W3C specifications for the World Wide Web and XML.
I completely agree that NFC should be used. However, the "should
consider" reminds me of RFC6919 "Further Key Words for Use in RFCs to
Indicate Requirement Levels" (published April 1, 2013). In any case, the
RDF 1.1 Concepts and Abstract Syntax W3C recommendation (section 3.3)
also states that lexical strings "SHOULD be in Normal Form C". It would
be easiest to simply say that "BIBFRAME implementers SHOULD use NFC"
since that is what the relevant standards already specify.
> Most existing MARC data incorporates use of the language codes found in ISO 693-2/B. While the codes in this standard are useful, it may be necessary in implementation to accommodate the codes from ISO 639-1 (2-letter codes) and ISO 639-3 as well.
Adhering to BCP 47 would already provide a mechanism for expressing ISO
639-1 and 639-3 language codes, so I don't see why the
implementation-level consideration is not simply to use BCP 47 as was
already stated under General considerations.
-Osma
Robert J. Rendall kirjoitti 13.12.2017 klo 22:30:
> Colleagues -
>
> The ALA/ALCTS Committee on Cataloging: Asian and African Materials
> (CC:AAM) has voted to approve a Statement in Support of the
> Internationalization of BIBFRAME, containing recommendations on
> character encoding, the representation of original script and
> romanization, normalization, and language tags:
>
> http://connect.ala.org/node/271553
>
> Robert Rendall
> Chair, CC:AAM 2017-2018
> http://www.ala.org/alcts/mgrps/camms/cmtes/ats-ccscataa
>
> Robert Rendall
> Principal Serials Cataloger
> Original and Special Materials Cataloging, Columbia University Libraries
> 102 Butler Library, 535 West 114th Street, New York, NY 10027
> tel.: 212 851 2449 fax: 212 854 5167
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[log in to unmask]
http://www.nationallibrary.fi
|