Print

Print


Thanks for this clarification.

On 10/28/2010 4:10 PM, Smith,Gary wrote:
> While it's true that OCLC's batch processing does turn UTF-8 
> characters with no MARC-8 equivalent into NCRs, most of those NCRs are 
> subsequently flagged and reported as invalid characters.  Only NCRs in 
> field 880 are accepted, and only those which fall into specific 
> Unicode blocks such as Cyrillic, Arabic, Hebrew, Greek, and the CJK 
> unified Ideographs.
>
> Gary L. Smith
> Software Architect
> OCLC
> [log in to unmask]
>
> ------------------------------------------------------------------------
> *From:* MARC [mailto:[log in to unmask]] *On Behalf Of *Corinna Baksik
> *Sent:* Tuesday, October 26, 2010 16:16
> *To:* [log in to unmask]
> *Subject:* use of UTF-8 characters not supported by OCLC
>
> Hello - are there any institutions that are using UTF-8 characters 
> that aren't supported by OCLC, such as many of the characters in the 
> "Latin Extended B" set?
>
> We have staff working with African materials that would like to use 
> characters such as the reversed E, and our local policy has been to 
> discourage use of characters that are not supported by OCLC, even 
> though our ILS (Aleph) supports these.
>
> I'm interested in hearing about the policies at other libraries. In 
> the past, records with these characters would be rejected by OCLC, but 
> my understanding is that OCLC's more recent batch processing programs 
> accept these characters and turn them into NCRs. If staff retrieve one 
> of these records they convert the character into its UTF-8 value. An 
> example of an OCLC record with NCRs is #436225032, which contains 
> ѧ in the Cyrillic parallel 245 tag.  (I believe the OCLC 
> treatment of characters may vary depending on whether it's encountered 
> in an 880 or not).
>
> Now that the MARC standard allows the full UCS repertoire (1), I 
> wonder to what extent libraries are using it.
>
> Thank you,
>
> Corinna Baksik
>
> Systems Librarian
> Harvard University Library
> Office for Information Systems
> 90 Mt. Auburn St.
> Cambridge, MA 02138
>
> 617.495.3724
>
>
>
> (1) "To facilitate the movement of records between MARC-8 and Unicode 
> environments, it was recommended for an initial period that the use of 
> Unicode be restricted to a repertoire identical in extent to the 
> MARC-8 repertoire. In 2007, however, such a restriction is no longer 
> appropriate. The full UCS repertoire, as currently defined at the 
> Unicode web site, is valid for encoding MARC 21 records, subject only 
> to the constraints described below." 
> http://www.loc.gov/marc/specifications/speccharucs.html
>
>
> -- 
> Corinna Baksik
> Harvard University Library
> Office for Information Systems
> 90 Mt. Auburn St.
> Cambridge, MA 02138
>
> 617.495.3724