I think that John’s analysis of the difficulty of holding in mind and reason about document content – and statements about document thus distributed – is a correct one. It is precisely why in our “Graph Theory A Ivory Towers” paper and in a number of online slideshows** Barbara Tillett and I focused on developing an implementation-independent schematic representation of Cultural Heritage resources and the things we say about them.

Carefully shaped graphical thinking tools (which involve a bit more than constructing mere visualizations) are all over the place in the sciences, with documented success in cases where workers must achieve and reason with a high-level view of a complex, potentially dynamic, process.

**  E.g., slides 81-113 of

I think there is a kernel of truth worth examining in James' statements. 

The traditional catalog entry is not metadata or even data but rather is a metadocument -- a document (the catalog record) about a document (the resource). The great conceptual stumbling block for many of us (and I count myself in that number), as humans accustomed to interpreting documents and metadocuments, is to visualize a data architecture that disaggregates our traditional metadocument components into discrete data units which can then be, not just transferred and stored by machines, but interpreted, manipulated, and reconstituted back into metadocuments by them. Further, these resulting metadocuments need not be the same as what was input, but can be profiled to render output in different languages (via linked vocabulary registries), or with less information (say for a mobile display), or more information (by drawing on data from external sources).

I would go so far as to say that the Anglo-American community is particularly hampered by this transition, since our linguistic syntax relies on position to determine the function of the words we read (i.e. the data we intake) -- "Dog bites man" is different than "Man bites dog" -- where inflected syntax relies on declension and conjugation as markup to determine the corresponding word functions. Consequently, we have a stronger tendency to need to "read" our data in documentary form, as presented in the metadocuments that make up our catalog records, rather than drawing on it solely as data.

A catalog does *not* contain data in the normal IT sense of the word. That is probably another strange idea, but it is nevertheless a fact. It contains information (data) that will *help* you find the information you want. In other words, it contains *directional* information to what you want, but it does not contain the information itself. Let's see how this works in reality.

I think this shows how the "data" in the library catalog is fundamentally different from the "data" in other kinds of databases. And it also illustrates how the normal tools used for "data mining" and "data extraction" that work fairly well in other venues are more or less doomed to failure when applied to library catalogs. They contain a different kind of data.