On Sep 9, 2006, at 2:23 PM, Karen Coyle wrote:
> The proposal is available at:
> http://www.loc.gov/marc/marbi/2006/2006-09.html
>
> The main concern is what will happen during the time that we have a
> mixed environment, with some systems having gone to Unicode, but
> others still using MARC-8. Sharing data in that mixed environment
> will mean that there will be times when a MARC-8 based system will
> receive a Unicode record with characters that are not valid in
> MARC-8. The issues are explained in this report:
> http://www.loc.gov/marc/marbi/2005/2005-report01.pdf
The only problem I see here is that as long as long as MARC-8 is
supported and extended as a transmission encoding there will *always*
be a mixed environment--since the rest of the computing world is
using unicode for internationalization support. After a quick read of
the pdf this appears to have been the original position of the MARC21
community back in the early 1990s...but experience has shown
libraries have been slow to order (at a cost presumably) the enhanced
support for unicode that their vendors provide.
Thanks very much for the pointer to the proposal. I imagine in 3
Proposal:
XXXX;
should be:
&#xXXXX;
If not this could lead to some profound problems with strings like:
feed;
I know I'm late to the party but I remain unconvinced that receiving
a record with MARC-8 interspersed with what amounts to unicode html
entities is easier to process than a MARC record which says it's
UTF-8 using position 9 in the leader and which contains UTF-8. The
proposal seems to presume that since OPACs are web applications the
HTML entities in the transmitted MARC data will flow all the way
through into the HTML emitted by an OPAC.
To a disconnected outsider who has lurked on this list since the
beginning and implemented MARC and MARC-8 software support it seems
like OCLC's internals and business models are leaking out into the
MARC21 specification. I say this while realizing at the same time
that being the custodian of a large MARC data set like Worldcat
probably changes ones perspective on this problem a bit :-)
Admittedly I have little knowledge of OCLC's current subscription
plans. But if I were OCLC and I wanted to encourage the use of UTF-8
in library data while still supporting libraries that lack UTF-8
support in their catalogs here's what I might do.
Alert subscribers that OCLC is moving to two subscription plans in 2007:
Plan A: receive wholesome/shiny MARC records encoded as UTF-8 at
$29.95/month (current subscription rate)
Plan B: receive possibly incomplete MARC records encoded as MARC-8
(since some of OCLC's unicode data can't be encoded in MARC-8) at
$34.95/month
Each year the cost difference between A and B could increase further
and librarian's sense of what's right and market forces would take
care of the rest. OCLC/RLG's position in the market place should make
this even easier :-)
//Ed
|