[log in to unmask]" type="cite">While it's true that OCLC's batch processing does turn UTF-8 characters with no MARC-8 equivalent into NCRs, most of those NCRs are subsequently flagged and reported as invalid characters. Only NCRs in field 880 are accepted, and only those which fall into specific Unicode blocks such as Cyrillic, Arabic, Hebrew, Greek, and the CJK unified Ideographs.
Gary L. Smith
[log in to unmask]Hello - are there any institutions that are using UTF-8 characters that aren't supported by OCLC, such as many of the characters in the "Latin Extended B" set?
From: MARC [mailto:[log in to unmask]] On Behalf Of Corinna Baksik
Sent: Tuesday, October 26, 2010 16:16
To: [log in to unmask]
Subject: use of UTF-8 characters not supported by OCLC
We have staff working with African materials that would like to use characters such as the reversed E, and our local policy has been to discourage use of characters that are not supported by OCLC, even though our ILS (Aleph) supports these.
I'm interested in hearing about the policies at other libraries. In the past, records with these characters would be rejected by OCLC, but my understanding is that OCLC's more recent batch processing programs accept these characters and turn them into NCRs. If staff retrieve one of these records they convert the character into its UTF-8 value. An example of an OCLC record with NCRs is #436225032, which contains ѧ in the Cyrillic parallel 245 tag. (I believe the OCLC treatment of characters may vary depending on whether it's encountered in an 880 or not).
Now that the MARC standard allows the full UCS repertoire (1), I wonder to what extent libraries are using it.
Harvard University Library
Office for Information Systems
90 Mt. Auburn St.
Cambridge, MA 02138
(1) "To facilitate the movement of records between MARC-8 and Unicode environments, it was recommended for an initial period that the use of Unicode be restricted to a repertoire identical in extent to the MARC-8 repertoire. In 2007, however, such a restriction is no longer appropriate. The full UCS repertoire, as currently defined at the Unicode web site, is valid for encoding MARC 21 records, subject only to the constraints described below." http://www.loc.gov/marc/specifications/speccharucs.html
-- Corinna Baksik Harvard University Library Office for Information Systems 90 Mt. Auburn St. Cambridge, MA 02138 617.495.3724