Thanks, Tom; sorry to have been less than fully precise on referring to MARC-8.  I just meant that in the sense that it covers a range of characters that is far less than what is found in Unicode.  Agreed on 15924.

I would definitely like to see 639-3 come through at least as an option for the cataloger.  Some of the codes in 639-2/B are essentially nothing more than catchall categories, allowing for no granularity in distinguishing between a few hundred languages that share the same fixed field code.

From: Bibliographic Framework Transition Initiative Forum [[log in to unmask]] on behalf of Tom Emerson [[log in to unmask]]
Sent: Wednesday, November 09, 2011 4:31 PM
To: [log in to unmask]
Subject: Re: [BIBFRAME] Introduction (@W3C)

On Nov 9, 2011, at 3:39 PM, Riley, Charles wrote:
> Bibliographic data is largely built on the MARC-8 character set, in essence a subset of UTF-8; thus a loss of data for the preponderance of materials in non-Latin scripts has already occurred by the time data becomes bibliographic.

I don't think MARC-8 is properly a "subset" of UTF-8: I'm not sure what that means. MARC-8, as I understand, is more similar to ISO-2022 where you can switch between multiple character sets within a single text stream. UTF-8 is an encoding form of Unicode: a different beast entirely.

I would hope that Unicode would be used for any future bibliographic representation: the choice of encoding then depends on the particular serialization format used. There is little we can do if the original data has been lost, but having the foundation to represent the world's current and historical scripts is a vital requirement, and Unicode fits the bill here.

In addition to specifying language (whether ISO 639-2/B or 639-3 I don't have a preference) we should also consider specifying script details. ISO 15924 works well for this, e.g., to distinguish a title in Simplified Chinese vs. one in Traditional.


P.S. All opinions are my own and do not necessarily represent my employer.

Tom Emerson
Principal Software Engineer --- Search
EBSCO Publishing
10 Estes Street
Ipswich, MA 01938, USA
Phone: +1-978-356-6500 x2185
[log in to unmask]