While it's true that OCLC's batch processing does turn UTF-8 characters
with no MARC-8 equivalent into NCRs, most of those NCRs are subsequently
flagged and reported as invalid characters.  Only NCRs in field 880 are
accepted, and only those which fall into specific Unicode blocks such as
Cyrillic, Arabic, Hebrew, Greek, and the CJK unified Ideographs.

Gary L. Smith 
Software Architect 
[log in to unmask] 


From: MARC [mailto:[log in to unmask]] On Behalf Of Corinna Baksik
Sent: Tuesday, October 26, 2010 16:16
To: [log in to unmask]
Subject: use of UTF-8 characters not supported by OCLC

Hello - are there any institutions that are using UTF-8 characters that
aren't supported by OCLC, such as many of the characters in the "Latin
Extended B" set? 

We have staff working with African materials that would like to use
characters such as the reversed E, and our local policy has been to
discourage use of characters that are not supported by OCLC, even though
our ILS (Aleph) supports these. 

I'm interested in hearing about the policies at other libraries. In the
past, records with these characters would be rejected by OCLC, but my
understanding is that OCLC's more recent batch processing programs
accept these characters and turn them into NCRs. If staff retrieve one
of these records they convert the character into its UTF-8 value. An
example of an OCLC record with NCRs is #436225032, which contains
ѧ in the Cyrillic parallel 245 tag.  (I believe the OCLC
treatment of characters may vary depending on whether it's encountered
in an 880 or not). 

Now that the MARC standard allows the full UCS repertoire (1), I wonder
to what extent libraries are using it. 

Thank you,

Corinna Baksik 

Systems Librarian 
Harvard University Library
Office for Information Systems
90 Mt. Auburn St. 
Cambridge, MA 02138


(1) "To facilitate the movement of records between MARC-8 and Unicode
environments, it was recommended for an initial period that the use of
Unicode be restricted to a repertoire identical in extent to the MARC-8
repertoire. In 2007, however, such a restriction is no longer
appropriate. The full UCS repertoire, as currently defined at the
Unicode web site, is valid for encoding MARC 21 records, subject only to
the constraints described below."

Corinna Baksik
Harvard University Library
Office for Information Systems
90 Mt. Auburn St. 
Cambridge, MA 02138