Thanks for this clarification.

On 10/28/2010 4:10 PM, Smith,Gary wrote:
[log in to unmask]" type="cite">
While it's true that OCLC's batch processing does turn UTF-8 characters with no MARC-8 equivalent into NCRs, most of those NCRs are subsequently flagged and reported as invalid characters.  Only NCRs in field 880 are accepted, and only those which fall into specific Unicode blocks such as Cyrillic, Arabic, Hebrew, Greek, and the CJK unified Ideographs.
 

Gary L. Smith
Software Architect
OCLC
[log in to unmask]


From: MARC [mailto:[log in to unmask]] On Behalf Of Corinna Baksik
Sent: Tuesday, October 26, 2010 16:16
To: [log in to unmask]
Subject: use of UTF-8 characters not supported by OCLC

Hello - are there any institutions that are using UTF-8 characters that aren't supported by OCLC, such as many of the characters in the "Latin Extended B" set?

We have staff working with African materials that would like to use characters such as the reversed E, and our local policy has been to discourage use of characters that are not supported by OCLC, even though our ILS (Aleph) supports these.

I'm interested in hearing about the policies at other libraries. In the past, records with these characters would be rejected by OCLC, but my understanding is that OCLC's more recent batch processing programs accept these characters and turn them into NCRs. If staff retrieve one of these records they convert the character into its UTF-8 value. An example of an OCLC record with NCRs is #436225032, which contains ѧ in the Cyrillic parallel 245 tag.  (I believe the OCLC treatment of characters may vary depending on whether it's encountered in an 880 or not).

Now that the MARC standard allows the full UCS repertoire (1), I wonder to what extent libraries are using it.

Thank you,

Corinna Baksik

Systems Librarian
Harvard University Library
Office for Information Systems
90 Mt. Auburn St.
Cambridge, MA 02138

617.495.3724



(1) "To facilitate the movement of records between MARC-8 and Unicode environments, it was recommended for an initial period that the use of Unicode be restricted to a repertoire identical in extent to the MARC-8 repertoire. In 2007, however, such a restriction is no longer appropriate. The full UCS repertoire, as currently defined at the Unicode web site, is valid for encoding MARC 21 records, subject only to the constraints described below."  http://www.loc.gov/marc/specifications/speccharucs.html



-- 
Corinna Baksik
Harvard University Library
Office for Information Systems
90 Mt. Auburn St. 
Cambridge, MA 02138

617.495.3724