This is why S.l. and S.n. should have been left alone in RDA. And relator codes should be preferred. Let the machines deal with them not fallible humans.



Michael Mitchell

Technical Services Librarian

Brazosport College

Lake Jackson, TX

Michael.mitchell at




From: Bibliographic Framework Transition Initiative Forum [mailto:[log in to unmask]] On Behalf Of Diane Hillmann
Sent: Wednesday, May 29, 2013 12:00 PM
To: [log in to unmask]
Subject: Re: [BIBFRAME] Consistency



I found the same kinds of things when aggregating NSDL data about a decade ago, though of course on a smaller scale! (Defaults with various misspellings of 'unknown' were my particular trigger). I think that what would help us avoid having to cope with crappy text into our dotage is to build tools that help us serve up standardized text when we think we still need it, while not actually creating or storing it as text. We know humans will continue to make these kinds of errors if we ask them to enter text during the cataloging process, but if users need to see these kinds of notes, we need to build smarter tools to make it happen.  Continuing to rant about the imperfect humans around us doesn't help at all.




On Wed, May 29, 2013 at 10:49 AM, Tennant,Roy <[log in to unmask]> wrote:

On 5/28/13 5/28/13 € 10:48 PM, "Bernhard Eversberg" <[log in to unmask]>

>Consistency is not hugely important for purely descriptive data...

>Consistency is of utmost importance for access-related data.

Agreed. But we nonetheless seem to have focused too much on consistency of
descriptive data (for example, "ill." in collation statements) and yet not
enough in access-related data (for example, we are unable to consistently
determine when a URL will take the user to the full item).

And as the table that my colleague Ralph LeVan provided earlier
demonstrates, our data is horribly inconsistent in the aggregate.

Here is but a beginning list of the problems we face in trying to be

1) Rules that are inexact or difficult to understand.
2) An unclear understanding, or an imperfect use (whether deliberate or
inadvertent), of those rules.
3) Typographical errors.
4) Data acceptance systems (either single record or batch) that fail to
validate appropriate elements.
5) Violation of rules for local purposes (for example, putting data in a
different element so it will display in a particular system; or adding
HTML markup to elements for local display purposes).
6) etc.

I'd like to assert that these problems are in our past, but I clearly
cannot. Let's take the 264 field for example[1]. Recently created, these
fields are now pouring into WorldCat (in Jan. we found 56,706 such fields
and in April we found 158,019 -- nearly three times as many). Meanwhile,
the rules seem fairly specific about what one should do if the place of
publication is not apparent[2]: put "[Place of publication not identified]
:" in the $a. Not any of these:

[Place of publication not identified :
[place of publication not identified] :
Place of publication not identified :
[Place of publication unknown] :
[Place of publication not given] :
Unknown place of publication :
[place of publication not indicated] :
[Place of publication not known] :
Unknow place of publication :
No place of publication :
Place of publication unknown :

All of which (and more) already occur[3], and more still as they continue
to pour in.

So I guess my point is this: we all need to own this problem and work
against the forces of inconsistency outlined above and others that may
occur to you. These will include a wide variety of techniques that must
encompass the entire library metadata ecosystem -- from the individual
cataloger to the massive aggregators like my employer.
Roy Tennant
OCLC Resarch

P.S. And please don't get me started on that colon. One rant per day is
quite enough.

[3] and for additional
amusement, see all the ways "New York" has already been entered here: