Hal is right. Over the last decade the WorldCat database has changed
greatly in nature due to the batch loading of millions of records from
The 2009 OCLC annual report  has a chart on page 12 that shows that
in 1998 Worldcat had 39 million records; in 2009 it had 139 million --
most of which came from batch loading. The 2012 report has a figure of
237 million records. That's about a 6x growth in less than 15 years.
OCLC has 22.5K member libraries (which I read as being libraries that do
their cataloging on OCLC), but over 74K "participating libraries."
Another annual report gives the actual figure of member records v.
non-member records, and member records are in the minority. [citation
needed] The 2012 report gives good stats on numbers of records batch
loaded, and it's quite impressive - hundreds of millions.
As we saw with the list of subject heading terms that Roy produced
(which I don't have a link to, sorry), many of the terms ("geschichte"
was a notable one) come from data that is from outside of the
That said, the OCLC WorldCat database is a good measure of the
bibliographic universe beyond AACR and MARC, although the
"MARC-ification" of the data may mask some of the qualities of the
I highly recommend looking at the annual reports for good data about
OCLC's growth and contents. There are stats on record numbers by
 annual reports are listed on this page:
On 3/8/13 9:20 PM, Hal Cain wrote:
> On Fri, 8 Mar 2013 16:12:48 -0500, Simon Spero <[log in to unmask]> wrote:
>> Field 245 shows some other curiosities:
>> Subfield 245 $k has 469,891 occurrences, but only 427,311 holdings; this
>> suggests that there are records included in the counts which have zero
>> holdings. These might be worth filtering out.
> Something else that probably has little impact on the totals for the whole
> database, but which should be taken into account if investigation is
> segmented by publication date: my experience suggests that there is a
> sizeable number of duplicate records for pre-AACR cataloguing (that is, more
> or less, pre-1970 publications -- and I would say the period till 1980 and
> the onset of AACR2 also includes more duplicates than later. I ascribe this
> to far less uniformity in cataloguing practice before AACR, and many such
> records having been converted retrospectively with little review, and loaded
> in bulk. In addition, many foreign records, and British Library files, and
> the like, totally non-AACR/AACR2 and without any subject access, have been
> loaded in recent years. The WorldCat database is quite a mixture; beware of
> relying too heavily on any simple tabulations.
> Hal Cain
> Melbourne, Australia
> [log in to unmask]
[log in to unmask] http://kcoyle.net