I am new to this list and am looking for advice on a particular MARC-8 to
Unicode conversion issue, the conversion of the ligature.
My library (Stanford Univ. Libraries) is in the process of converting its
local system to Unicode. As part of that, our ILS vendor will convert our
MARC-8 bibliographic database to Unicode according to the MARC21 standard.
In September 2004, the mapping of the two halves of the ligature (M+EB and
M+EC) was officially changed from U+FE20 and U+FE21 to U+0361 (see
http://lcweb2.loc.gov/cocoon/codetables/45.html) after Proposal 2004-08 was
reviewed and approved. Our vendor's conversion program follows the new
mapping, as it should, and so the ligatures that currently exist in our
database will be converted to U+0361. However, both RLG and OCLC are
currently still following the old mapping when records are exported in
UTF-8 from their systems, and upon inquiry I was told that both
organizations have no plan to implement the new mapping in the foreseeable
future. This seems like a problematic situation to us, one that could lead
to the same character being encoded in two different ways in the same
database. In the short term, we can get around it by continuing to export
records from WorldCat and RLIN in MARC-8, and run them through our local
system's conversion program before loading in order to produce the same
encoding, but that is not a good long term solution. I am interested to
know how other libraries that have already converted to Unicode deal with
this issue, and whether there is work underway within the MARC21 community
to resolve this conflict.
Thanks.
-- Vitus
_____________________________________________
Vitus Tang
Head, Data & Materials Control
Cataloging and Metadata Services
Stanford University Libraries
[log in to unmask] - (650) 725-1153
|