I think there is a kernel of truth worth examining in James' statements. 

The traditional catalog entry is not metadata or even data but rather is a metadocument -- a document (the catalog record) about a document (the resource). The great conceptual stumbling block for many of us (and I count myself in that number), as humans accustomed to interpreting documents and metadocuments, is to visualize a data architecture that disaggregates our traditional metadocument components into discrete data units which can then be, not just transferred and stored by machines, but interpreted, manipulated, and reconstituted back into metadocuments by them. Further, these resulting metadocuments need not be the same as what was input, but can be profiled to render output in different languages (via linked vocabulary registries), or with less information (say for a mobile display), or more information (by drawing on data from external sources).

I would go so far as to say that the Anglo-American community is particularly hampered by this transition, since our linguistic syntax relies on position to determine the function of the words we read (i.e. the data we intake) -- "Dog bites man" is different than "Man bites dog" -- where inflected syntax relies on declension and conjugation as markup to determine the corresponding word functions. Consequently, we have a stronger tendency to need to "read" our data in documentary form, as presented in the metadocuments that make up our catalog records, rather than drawing on it solely as data.

John Myers, Catalog & Metadata Librarian
Schaffer Library, Union College
Schenectady NY 12308

518-388-6623

On Sat, Mar 7, 2015 at 2:20 PM, James Weinheimer <[log in to unmask]> wrote:

[snip]
A catalog does *not* contain data in the normal IT sense of the word. That is probably another strange idea, but it is nevertheless a fact. It contains information (data) that will *help* you find the information you want. In other words, it contains *directional* information to what you want, but it does not contain the information itself. Let's see how this works in reality.

[snip]
I think this shows how the "data" in the library catalog is fundamentally different from the "data" in other kinds of databases. And it also illustrates how the normal tools used for "data mining" and "data extraction" that work fairly well in other venues are more or less doomed to failure when applied to library catalogs. They contain a different kind of data.