-----BEGIN PGP SIGNED MESSAGE-----
> The character '@' appears also in german WorldCat data as a filing indicator
> described in german catalog code RAK-WB § 822, so I assume it propagated to
> other german catalogs that deliver bibliographic data on international level.
> Example: al- @Matḥaf al-Miṣrī <al-Qāhira> in
RAK § 822 is located in section 9. something of the rules, dealing with
collation (filing rules). In this section the words to be omitted are
visualized by enclosing them in ¬...¬ (not signs), from my understanding
this is purely illustrative, these signs must never be shown on the
display / printed card.
BTW § 822 in < http://d-nb.info/986402338/34 > connects to RAK WB § 203,3
which actually prescribes the insertion of an artificial blank into
"L'aurore" or "al-Matḥaf al-Miṣrī" if the articles fall under non-filing
rules. [This is probably to give cataloguers better opportunity to mark
the first filing character on the printed card by soft pencil as was(?)
RAK never acknowledged the possibility of storing catalogue information on
other media than paper, therefore any markup (e.g. using "@" or "¬" to
signify something) apart from ISBD punctuation and RAK-variations thereof
always is completely out of scope for this cataloging code.
On the other hand the german data exchange standard MAB2 in its general
section defines non-filing characters and declares them to enclose the
strings to be omitted, excluding trailing blanks, this yields ("¬...¬"
still a visualization, actually start and end characters have two distinct
codes): "¬L'¬ aurore" or "¬al-¬ Matḥaf al-Miṣrī". The artificial blanks
are demanded by the cataloging code and the data exchange standard does
not stand up to remedy this.
The MAB non-sort characters are quite universally valid for any field
(whenever somebody has the desire to sort differently from the data
entered) and especially they occur in almost every field of the group 3XX
(titles, statement of responsibility and the like). The individual field
definitions only mention them if some extra meaning is superimposed, for
instance for 331 (title proper) text in nonfiling charactes may be followed
by space plus text for filing in brackets as in "¬99¬ [ninety-nine] red
As for the "@" this is a PICA-ism, cataloguers in the ILTIS system of the
DNB mark the first sort position in a field by this character. In theory
that character (and the "/" marking the last position in personal names
relevant for sorting as in "Goethe, Johann Wolfgang /von") should never
leak out of their system, but since cataloguers on one hand actively enter
the character whenever they find it appropriate and export interfaces on
the other hand have to explictly transcode the situation into the syntax
of the target format, it was already for MAB2 data not uncommon to contain
sometimes @'s where non-sort characters probably were intended.
> Some 15 years ago, there had been a discussion about the most preferred
> mechanism for indicating non-filing zones in MARC21. This document discusses the
> introduction: http://www.loc.gov/marc/marbi/1998/98-16r.html The result was,
> two invisible control characters were assigned a new meaning. The invisibility
> of the control characters did not disturb card printing or screen display.
> Later, in MODS, a visible <nonSort> XML element was introduced. The reason was,
> XML did not allow invisible control characters.
Fact is, XML 1.0 (!) does not allow control characters in the range 0x00-0x1f
except for TAB, CR, and LF (eg. subfield characters are forbidden and have
to be dealt with by other means). The control charcters for non-filing are in
the range 0x80-0x9F and perfectly legal in XML (1.0) documents.
MAB-XML transforms the control charaters into an XML element <ns>. I think
this is completely natural since XML is all about markup and not marking up
by XML means a contiguous character sequence wich has to be marked up anyway
would be quite silly (TEI does but this is since it codes the logical structure
and the visual nature of a document it has to resort to kind of "markers" for
the latter in order not to conflict with XML nesting rules for elements).
RDF graphs are not very well suited to reflect text with markup, although
our prevalent case of "texts with one optional *initial* substring with
one special meaning" should not be too hard to deal with. I think the
situation is analoguous to unqualified Dublin Core, anyhow it should be
clear that solutions which are the "proper XML way" of encoding things
often are quite opposite to "proper RDF ways".
Non-sort characters are used to mark instances of the concept of non-filing
zones as introduced by the cataloging code in actual data. For MAB2 these
characters have always been part of the exchange standard's syntax, not
different from end-of-field or end-of-record markers. We also have cases
where a special syntax is specifically defined by the cataloging code and
the data format acts "transparently", i.e. it just transports it. The most
prominent example for this is the use of the comma (",") in the notation
of personal names: "Adams, Henry" (invert for everyday usage for most regions
outside the Alps) or "Mao, Zedong" (simply omit the "," to obtain everyday
usage). For a full-featured conversion from MAB2 or MARC21 data into MODS
or RDF in these cases one has to peek into the data, extending the analysis
from the format syntax onto the additional syntax layer specified by the
cataloging code the field contents are conforming to.
People who not yet have realized that non-sort characters were incorporated
into the MARC standard in late 2000(?) might think of non-sort characters as
a peculiarity of the cataloging rules reigning the record in question
and therefore should always "transparently" dumped/piped into whatever format
MARC21 is exported to. But I think they are wrong since MARC has adopted the
concept, incorporated the syntax and therefore one has to deal with it.
> What can we learn? In the punchcard age, it was enough to define two invisible
> control characters to assign a new notion of "non-sorting" to interpret ISO2709
> streams in a new way. In the XML world, invisible data was no longer allowed,
> and non-filing control became visible in markup elements.
read: became explicit by markup elements, thus obliviating special control
characters. Since XML tags (not elements!) one does not know or does not care
about usually are silently ignored and only textual data is displayed, this
was very elegant and kind of progress to control characters where one could
never be certain that some kind of software wouldn't visualize them anyway
as control symbols, funny block characters or whatever.
> Now today, in Linked Data, a semantic context should be provided, so non-filing
> can be applied (or ignored) by programs successfully under any circumstances, if
> it's visible or not.
Linked Data focusses on data, to some extend neglecting the convenience lying
in the ambivalence of textual data:
:he foaf:name "Dan Brickley" .
is completely independent of
:he foaf:familyName "Brickley" ;
foaf:givenName "Dan" .
the former does not decompose to the latter and the latter you cannot synthesize
to the former (no inherent order of statements, no universal rule with respect
to inserting blanks when concatenating strings). Therefore you will always have
to supply both forms in the cases where sorting or deeper analysis /and/ kind
of display is needed. With any kind of titles we are in a comparable situation.
(Worse yet: Our "Brickley, Dan" with inversion and comma does not fit in either
of these aspects nor does our "L' aurore" with the extra space since the
cataloging rules mandate "normalization" upon data entry and therefore even our
"transcribed" elements often are quite distorted ...)
In library land we have a long tradition in entering all kind of important
information twice: In normalized form as for headings and transcribed from
the source or as note for display. "XML" gave us some promise to overcome
that by augmenting transcriptions with clever markup, "RDF" defeats that
again. The lesson to learn therefore is not to rely on RDF (a format) as
a device for data entry: In our domain the concept of personal name and how
to code it is common enough to expect tools which properly react on a ","
typed in. And titles as transcribed elements superimposed with instructions
for sorting are also common enough to expect tools which spare us the
labor of duplicate entry (hey: we have a computer and a mouse and a database
and should use that, where RDF is just plain data).
> I think the non-filing indicator characters are just one case that demonstrates
> the importance to annotate the meaning of symbols in bibliographic strings that
> serve special semantics (there are more symbols, for example "Ordnungshilfen" =
> filing hints, or cataloger's comments in brackets '[' and ']'). In Linked Data
> enviroments like Bibframe, there must be also some information about the context
> of the interpretation of the bibliographic string, that is, what are the special
> symbols in the string, and what catalog rules should be referred to for special
> symbol interpretation.
I strongly object. Linked Data is all about providing data /void/ of any
buried private (arcane, domain-specific) meaning introduced by characters
(visible or not) which do not stand for themselves.
One could define library specific RDF datatypes for those literals we know
as "titles" or "personal names" and these even could have XML embedded. But
this is a mechanism valid for syntax (restricting arbitrary strings to those
with a comma or <ns> Tags at certain positions) and has nothing to do with
semantic subdivision of these names or titles (there are no constructs allowing
statements like "in this kind of string the character '@' has the following
meaning: ...") and therefore are no viable solution.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----