To embed ASCII in the midst of an escaped character sequence, you need
to escape back to ASCII and then back again to the other character set.
From the documentation:
To redesignate ASCII, the following two-character escape sequence is used:
ESCs (ASCII 1B(hex) 73(hex)) for ASCII default character set
So this string should have a 1B 73 before the first ASCII character, and
then another 1B and the code for the character set that follows.
Basically, escape in/escape out for each change in character set. I
suspect this was translated from Unicode where the characters can
commingle freely.
kc
On 3/22/12 2:00 PM, Godmar Back wrote:
>
> Hi,
>
> Ed Summers suggested I pose the following question to this email list.
>
> In a MARC record exported by a vendor system (III) a 880$a subfield,
> which contains a Japanese title in EACC, contains these bytes:
>
> *1b 24 31 21 50 56 4b 37 6f 69 24 4e 21 51 31 21 47 34 69 24 4e 21 30 70
> 21 51 2b 7b 36 39 32 34 66 36 7d 1b 28 42*
>
> (colors used for easier reading), which III interprets as
>
> *米国の統治の仕組{6924f6}*
> *
> *
> In other words, they embed ASCII *{6024f6}* inside of an EACC string
> that consists of 24-bit EACC characters, but then embeds ASCII within {
> } before the next ESC sequence.
>
> When processing this record with pymarc (a Python library for MARC
> processing), problems occur because 7b 36 39 is not a valid EACC
> character encoding.
>
> My question: is the above byte sequence legal, and if not, does it, or
> similar sequences, constitute a frequent deviation (so it would be
> worthwhile to recognize and work around it in pymarc?)
>
> - Godmar
>
>
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|