Riley, Charles writes:
> Also, too many programmers have to understand raw "marc", because too
> much code produces broken records, and there are too many
Indeed: I've written a general purpose library for reading Z39.2 /
ISO-2709 encoded files and it is rife with hooks and special cases to
deal with the records we get from data providers. Supporting all the
possible variants is a nightmare (MARC-21, UniMARC, CMARC, CNMARC,
KORMARC, *MARC) and the inconsistencies and invalid crap I see drives me
Then again, RDF isn't free from that kind of thing. I've been working
with the latest publically available GND authority file (several
gigabytes of Turtle encoded RDF) from the DNB and I ended up having to
globally filter out one particular predicate because the values commonly
contained invalid IRIs that made Jena's TDB import barf.
> The character used for a field delimiter on one system, ǂ, is the
> alveolar click letter used in print in Khoesan languages, supported in
> ISO 6438 and therefore, by extension, in UNIMARC.
> Other systems use ‡ as the field delimiter.
Was this intentional? U+01C2 and U+2021 could be easily confused if the
font is lame enough.