Godmar,
The sequence you give is not standard MARC 21. The 1B 24 31 escape
sequence invokes CJK as the G0 character set, displacing the default
G0, ASCII. The MARC 21 standard requires an escape sequence to ASCII
to reinstate it as G0 for the "{6924f6}" string of characters.
Given the complexity of processing MARC CJK data and the different
approaches that have been used in various systems (and over time as
those systems were developed), it is reasonable to expect this sort of
problem (and others!).
[One illustration of the complexity of MARC CJK, is that even when CJK
is invoked as G0, Extended ASCII is still typically invoked as G1. It
is then possible to encounter a lone hex A8 (defined in Extended ASCII
as the middle dot character) in the midst of 24-bit CJK characters;
that is completely legal according to MARC 21.]
Joe Altimus
Arizona State University Libraries
On Thu, Mar 22, 2012 at 6:00 AM, Godmar Back <[log in to unmask]> wrote:
>
> Hi,
>
> Ed Summers suggested I pose the following question to this email list.
>
> In a MARC record exported by a vendor system (III) a 880$a subfield, which
> contains a Japanese title in EACC, contains these bytes:
>
> 1b 24 31 21 50 56 4b 37 6f 69 24 4e 21 51 31 21 47 34 69 24 4e 21 30 70 21
> 51 2b 7b 36 39 32 34 66 36 7d 1b 28 42
>
> (colors used for easier reading), which III interprets as
>
> 米国の統治の仕組{6924f6}
>
> In other words, they embed ASCII {6024f6} inside of an EACC string that
> consists of 24-bit EACC characters, but then embeds ASCII within { } before
> the next ESC sequence.
>
> When processing this record with pymarc (a Python library for MARC
> processing), problems occur because 7b 36 39 is not a valid EACC character
> encoding.
>
> My question: is the above byte sequence legal, and if not, does it, or
> similar sequences, constitute a frequent deviation (so it would be
> worthwhile to recognize and work around it in pymarc?)
>
> - Godmar
>
>
|