LISTSERV mailing list manager LISTSERV 16.0

Help for BIBFRAME Archives


BIBFRAME Archives

BIBFRAME Archives


BIBFRAME@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

BIBFRAME Home

BIBFRAME Home

BIBFRAME  February 2013

BIBFRAME February 2013

Subject:

Re: Unicode collation for Bibframe (Re: Filing indicators)

From:

Thomas Berger <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Sun, 17 Feb 2013 03:25:24 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (186 lines)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joerg,

> The character '@' appears also in german WorldCat data as a filing indicator
> described in german catalog code RAK-WB § 822, so I assume it propagated to
> other german catalogs that deliver bibliographic data on international level.
> Example:  al- @Matḥaf al-Miṣrī <al-Qāhira> in
> http://www.worldcat.org/oclc/179936500

RAK § 822 is located in section 9. something of the rules, dealing with
collation (filing rules). In this section the words to be omitted are
visualized by enclosing them in ¬...¬ (not signs), from my understanding
this is purely illustrative, these signs must never be shown on the
display / printed card.

BTW § 822 in < http://d-nb.info/986402338/34 > connects to RAK WB § 203,3
which actually prescribes the insertion of an artificial blank into
"L'aurore" or "al-Matḥaf al-Miṣrī" if the articles fall under non-filing
rules. [This is probably to give cataloguers better opportunity to mark
the first filing character on the printed card by soft pencil as was(?)
common practice]

RAK never acknowledged the possibility of storing catalogue information on
other media than paper, therefore any markup (e.g. using "@" or "¬" to
signify something) apart from ISBD punctuation and RAK-variations thereof
always is completely out of scope for this cataloging code.

On the other hand the german data exchange standard MAB2 in its general
section defines non-filing characters and declares them to enclose the
strings to be omitted, excluding trailing blanks, this yields ("¬...¬"
still a visualization, actually start and end characters have two distinct
codes): "¬L'¬ aurore" or "¬al-¬ Matḥaf al-Miṣrī". The artificial blanks
are demanded by the cataloging code and the data exchange standard does
not stand up to remedy this.

The MAB non-sort characters are quite universally valid for any field
(whenever somebody has the desire to sort differently from the data
entered) and especially they occur in almost every field of the group 3XX
(titles, statement of responsibility and the like). The individual field
definitions only mention them if some extra meaning is superimposed, for
instance for 331 (title proper) text in nonfiling charactes may be followed
by space plus text for filing in brackets as in "¬99¬ [ninety-nine] red
balloons".

As for the "@" this is a PICA-ism, cataloguers in the ILTIS system of the
DNB mark the first sort position in a field by this character. In theory
that character (and the "/" marking the last position in personal names
relevant for sorting as in "Goethe, Johann Wolfgang /von") should never
leak out of their system, but since cataloguers on one hand actively enter
the character whenever they find it appropriate and export interfaces on
the other hand have to explictly transcode the situation into the syntax
of the target format, it was already for MAB2 data not uncommon to contain
sometimes @'s where non-sort characters probably were intended.


> Some 15 years ago, there had been a discussion about the most preferred
> mechanism for indicating non-filing zones in MARC21. This document discusses the
> introduction: http://www.loc.gov/marc/marbi/1998/98-16r.html  The result was,
> two invisible control characters were assigned a new meaning. The invisibility
> of the control characters did not disturb card printing or screen display.
> Later, in MODS, a visible <nonSort> XML element was introduced. The reason was,
> XML did not allow invisible control characters.

Fact is, XML 1.0 (!) does not allow control characters in the range 0x00-0x1f
except for TAB, CR, and LF (eg. subfield characters are forbidden and have
to be dealt with by other means). The control charcters for non-filing are in
the range 0x80-0x9F and perfectly legal in XML (1.0) documents.

MAB-XML transforms the control charaters into an XML element <ns>. I think
this is completely natural since XML is all about markup and not marking up
by XML means a contiguous character sequence wich has to be marked up anyway
would be quite silly (TEI does but this is since it codes the logical structure
and the visual nature of a document it has to resort to kind of "markers" for
the latter in order not to conflict with XML nesting rules for elements).

RDF graphs are not very well suited to reflect text with markup, although
our prevalent case of "texts with one optional *initial* substring with
one special meaning" should not be too hard to deal with. I think the
situation is analoguous to unqualified Dublin Core, anyhow it should be
clear that solutions which are the "proper XML way" of encoding things
often are quite opposite to "proper RDF ways".

Non-sort characters are used to mark instances of the concept of non-filing
zones as introduced by the cataloging code in actual data. For MAB2 these
characters have always been part of the exchange standard's syntax, not
different from end-of-field or end-of-record markers. We also have cases
where a special syntax is specifically defined by the cataloging code and
the data format acts "transparently", i.e. it just transports it. The most
prominent example for this is the use of the comma (",") in the notation
of personal names: "Adams, Henry" (invert for everyday usage for most regions
outside the Alps) or "Mao, Zedong" (simply omit the "," to obtain everyday
usage). For a full-featured conversion from MAB2 or MARC21 data into MODS
or RDF in these cases one has to peek into the data, extending the analysis
from the format syntax onto the additional syntax layer specified by the
cataloging code the field contents are conforming to.

People who not yet have realized that non-sort characters were incorporated
into the MARC standard in late 2000(?) might think of non-sort characters as
a peculiarity of the cataloging rules reigning the record in question
and therefore should always "transparently" dumped/piped into whatever format
MARC21 is exported to. But I think they are wrong since MARC has adopted the
concept, incorporated the syntax and therefore one has to deal with it.


> What can we learn? In the punchcard age, it was enough to define two invisible
> control characters to assign a new notion of "non-sorting" to interpret ISO2709 
> streams in a new way. In the XML world, invisible data was no longer allowed,
> and non-filing control became visible in markup elements.

read: became explicit by markup elements, thus obliviating special control
characters. Since XML tags (not elements!) one does not know or does not care
about usually are silently ignored and only textual data is displayed, this
was very elegant and kind of progress to control characters where one could
never be certain that some kind of software wouldn't visualize them anyway
as control symbols, funny block characters or whatever.


> Now today, in Linked Data, a semantic context should be provided, so non-filing
> can be applied (or ignored) by programs successfully under any circumstances, if
> it's visible or not.

Linked Data focusses on data, to some extend neglecting the convenience lying
in the ambivalence of textual data:
:he foaf:name "Dan Brickley" .
is completely independent of
:he foaf:familyName "Brickley" ;
    foaf:givenName "Dan" .
the former does not decompose to the latter and the latter you cannot synthesize
to the former (no inherent order of statements, no universal rule with respect
to inserting blanks when concatenating strings). Therefore you will always have
to supply both forms in the cases where sorting or deeper analysis /and/ kind
of display is needed. With any kind of titles we are in a comparable situation.
(Worse yet: Our "Brickley, Dan" with inversion and comma does not fit in either
of these aspects nor does our "L' aurore" with the extra space since the
cataloging rules mandate "normalization" upon data entry and therefore even our
"transcribed" elements often are quite distorted ...)

In library land we have a long tradition in entering all kind of important
information twice: In normalized form as for headings and transcribed from
the source or as note for display. "XML" gave us some promise to overcome
that by augmenting transcriptions with clever markup, "RDF" defeats that
again. The lesson to learn therefore is not to rely on RDF (a format) as
a device for data entry: In our domain the concept of personal name and how
to code it is common enough to expect tools which properly react on a ","
typed in. And titles as transcribed elements superimposed with instructions
for sorting are also common enough to expect tools which spare us the
labor of duplicate entry (hey: we have a computer and a mouse and a database
and should use that, where RDF is just plain data).


> I think the non-filing indicator characters are just one case that demonstrates
> the importance to annotate the meaning of symbols in bibliographic strings that
> serve special semantics (there are more symbols, for example "Ordnungshilfen" =
> filing hints, or cataloger's comments in brackets '[' and ']'). In Linked Data
> enviroments like Bibframe, there must be also some information about the context
> of the interpretation of the bibliographic string, that is, what are the special
> symbols in the string, and what catalog rules should be referred to for special
> symbol interpretation.

I strongly object. Linked Data is all about providing data /void/ of any
buried private (arcane, domain-specific) meaning introduced by characters
(visible or not) which do not stand for themselves.

One could define library specific RDF datatypes for those literals we know
as "titles" or "personal names" and these even could have XML embedded. But
this is a mechanism valid for syntax (restricting arbitrary strings to those
with a comma or <ns> Tags at certain positions) and has nothing to do with
semantic subdivision of these names or titles (there are no constructs allowing
statements like "in this kind of string the character '@' has the following
meaning: ...") and therefore are no viable solution.

viele Gruesse
Thomas Berger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iJwEAQECAAYFAlEgP5QACgkQYhMlmJ6W47PCmQQAqBBHgHjNlsHrchZXdI4Gqf5o
ulZNokG8HnnkngUtFD9kYT6b5qj7reM3R7VhPHSjXDAt/WUtrAKCHh91u+xT/5D6
bihf+ThDcIIFBZPfQF5We3mbbhIg7KLsru+5L5A4DUBc3/z2Onk0Wqs4G0VdvkLW
NRJAT1EN3qiflULOWyg=
=IpNG
-----END PGP SIGNATURE-----

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
July 2011
June 2011

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager