Print

Print


The discussion about ISO 639-2 vs ISO 639-3 is a side topic of this
thread-- but I wanted to respond to this comment about having 639-3 as an
option for the cataloger. It is already available and can be used in MARC
in field 041, specifying in subfield $2 the source of the language code as
iso639-3.

Rebecca Guenther
(formerly chair of ISO 639 Joint Advisory Committee and Library of Congress)

On Wed, Nov 9, 2011 at 9:56 PM, Riley, Charles <[log in to unmask]>wrote:

> Thanks, Tom; sorry to have been less than fully precise on referring to
> MARC-8.  I just meant that in the sense that it covers a range of
> characters that is far less than what is found in Unicode.  Agreed on 15924.
>
> I would definitely like to see 639-3 come through at least as an option
> for the cataloger.  Some of the codes in 639-2/B are essentially nothing
> more than catchall categories, allowing for no granularity in
> distinguishing between a few hundred languages that share the same fixed
> field code.
>
>
> ________________________________________
> From: Bibliographic Framework Transition Initiative Forum [
> [log in to unmask]] on behalf of Tom Emerson [
> [log in to unmask]]
> Sent: Wednesday, November 09, 2011 4:31 PM
> To: [log in to unmask]
> Subject: Re: [BIBFRAME] Introduction (@W3C)
>
> On Nov 9, 2011, at 3:39 PM, Riley, Charles wrote:
> > Bibliographic data is largely built on the MARC-8 character set, in
> essence a subset of UTF-8; thus a loss of data for the preponderance of
> materials in non-Latin scripts has already occurred by the time data
> becomes bibliographic.
>
> I don't think MARC-8 is properly a "subset" of UTF-8: I'm not sure what
> that means. MARC-8, as I understand, is more similar to ISO-2022 where you
> can switch between multiple character sets within a single text stream.
> UTF-8 is an encoding form of Unicode: a different beast entirely.
>
> I would hope that Unicode would be used for any future bibliographic
> representation: the choice of encoding then depends on the particular
> serialization format used. There is little we can do if the original data
> has been lost, but having the foundation to represent the world's current
> and historical scripts is a vital requirement, and Unicode fits the bill
> here.
>
> In addition to specifying language (whether ISO 639-2/B or 639-3 I don't
> have a preference) we should also consider specifying script details. ISO
> 15924 works well for this, e.g., to distinguish a title in Simplified
> Chinese vs. one in Traditional.
>
>    -tree
>
> P.S. All opinions are my own and do not necessarily represent my employer.
>
> Tom Emerson
> Principal Software Engineer --- Search
> EBSCO Publishing
> 10 Estes Street
> Ipswich, MA 01938, USA
> Phone: +1-978-356-6500 x2185
> [log in to unmask]
>