After hitting Send, I realized I wanted to say a little more.
The data that our records are intended to record is bibliographic data. When we share our data with other services through linked data, the purpose is to share bibliographic relationships and attributes. Isaac Asimov wrote The Stars Like Dust. In 1980, Ballantine Books published an edition of The Hobbit with 310 pages. When we publish our records through linked data, the added value that we contribute is these bibliographic relationships and attributes. When you link the entity Isaac Asimov in our databases with the entity Isaac Asimov in database of scientists, we combine the information in two specialized resources. Each side provides value, the specialized knowledge of different experts. Our database does not have to have all that information itself. Bibliographic identification and description is still the main purpose of RDA, and should still be the focus of our cataloging activities, including NACO authority work.
Chris Baer asks:
But limiting RDA fields to traditional cataloging goals renders such records relatively useless for other purposes. They are really still just about “resources” or resource production, no matter how much you might think that they are about “entities.” As biography or history, they are at best half-baked, and probably superficially-researched. I have tried to give examples of how superficial research can result in confusing and conflating entities and getting relationships wrong. If so, what are such authority records actually good for? Traditional resource retrieval, probably, but not as a data set for other disciplines, as some people seemed to think they might be.
Apologies for clipping only this section, but I think this is the heart of your argument.
You are correct that our implementation of RDA does not produce records with comprehensive information about the entity. It was never intended to. RDA is indeed intended to allow our bibliographic data to be used, through linked data, as a data set for other disciplines, but it is not intended to be a _comprehensive_ data set. It is intended to be _supplementary_ to other data sets, providing data relevant for bibliographic identification, and linking their data and identities to ours. RDA data does not have to be complete in order to meet this purpose. RDA data by itself is not intended for the kinds of statistical searches you suggest. If you wanted a statistical search on, say authors who are astronomers, you would link our data with another database of people that contains professions, or perhaps specifically a database of astronomers. The purpose is not to provide all data on an entity, but to allow our data to be combined with data from other sources.
It is perfectly adequate to limit the data in our records to that which is easily found and can help with identification. With more information, it may be easier to identify matching entities in other databases. But we do not need to be completist and go out of our way to seek information for our records. We do not need to have 3xx fields in every record to accomplish our goal.
But doesn’t this “levels of best practice” scheme compromise the whole system? For a data base, e.g., the Social Security Death Index, to work properly, don’t all the data elements have to be entered at the same level of detail? This is certainly true of quantitative data bases. When I constructed a statistical data base of 19th century plank road in New York State, the study area level (county) and the quantity (miles per county) and the time frame (every five years) had to be entirely consistent or it could not be statistically compared against other county-level time series. If identifying all academic economists, or automobile industry executives or Archivists of the United States is left up to individual institutional whim, of what use are the results?
And doesn’t this then call into question exactly what it is that RDA is supposed to do and whether it is in fact doable at all, given staff, time and budgetary constraints? If people can only afford to enter dates, doesn’t it make as much, if not more, sense to add the dates to the 1xx where they can actually be seen by the patron and also appear in the BIB record where they can also be seen by the patron? How is the 046 expected to operate in a future search mechanism? I do a lot of date-limited searching in Ancestry.com, but if you don’t have at least one date known to anchor the search, you are basically taking pot-luck and getting lots of irrelevant results. If most institutions cannot afford the time to complete the 3xx fields (and I have pointed out before that each extra data element is also a potential pitfall for committing errors of one kind or another), then just how useful will the haphazard and incomplete 3xx data be?
The “Moderate” level as practiced at the British Library, seems to stick to the standard library cataloger’s objective of just enough information to distinguish one book from another and to be able to retrieve it from a storage unit. “Useful” and “Expedient to record” are by nature relativistic. For one institution, that might mean recording data on faculty, for us, its complex corporate and elite family trees, where we have most of the parts, while others would be lucky to have one. The “use” of differentiating and retrieving library books is quite different from that of providing an interpretive framework that allows a researcher to understand the context of unpublished documents that are in themselves usually not self-explanatory. That can only be done through research, not transcription.
Example: Last week we received a set of pocket diaries. The grandchildren to whom they were handed down seemed to have lost much of the information on the creator. It required research to establish that he was the great-grandnephew of Simon Bolivar the Liberator and the son of the Consul-General in the U.S. from Peru who married an American. That information goes a long way towards explaining why he stayed in New York and became the Latin American sales rep. of the International Bank Note Company and why his sales trips to Latin America are the main content of the diaries. The fact that the diaries end abruptly at the end of 1904 is explained by the fact that he died young 13 months later. Simply put, this is not cataloging. For one thing, the emphasis is on real-world entities and not their bibliographic shadows. For us, the amount of research meets the “useful” and “expedient to record” test, because otherwise the diaries become mere curiosities, but to catalogers it probably seems excessive.
But limiting RDA fields to traditional cataloging goals renders such records relatively useless for other purposes. They are really still just about “resources” or resource production, no matter how much you might think that they are about “entities.” As biography or history, they are at best half-baked, and probably superficially-researched. I have tried to give examples of how superficial research can result in confusing and conflating entities and getting relationships wrong. If so, what are such authority records actually good for? Traditional resource retrieval, probably, but not as a data set for other disciplines, as some people seemed to think they might be. Another case of overreach? Furthermore, most institutions seem to lack the time and money to reach even the “Moderate” level. We certainly don’t have the time or staff to create even pre-RDA authorities for all of our tens or hundreds of thousands of non-NAF names, much less even basic RDA ones that take at least three or four times as long. We haven’t had time to create MARC records to replace our old printed guides (which furthermore hold more information than can be crammed into a MARC record), much less whatever is coming next. But it hasn’t hobbled our operation, which is that of a professional service.
By the way, I think the “sky’s the limit” examples represent what some people were hoping to have, a fuller description of the entity and not simply a bibliographic reference. That is the way one would have to go to describe an entity accurately. I bet there were a lot more of these created during early implementation than now. No 2011152077 doesn’t look all that excessive to me, and dealing with people who play musical chairs among the top jobs in government, academia, the corporate world, private think tanks and elite law firms is more difficult than dealing with a person who gets tenure and churns out publications in a single discipline or sub-discipline for 40 years.
At the same time, for practical reasons, it is all too much. Is a retreat from grandiosity and overwrought expectations underway?
Manuscripts & Archives
Hagley Museum and Library
Reading the definition of “best practice” in Wikipedia, I see that different best practices are applicable to different institutions. A wide variety of best practices for the optional RDA fields have already been developed, both formally and informally. I would like to summarize three levels.
1. Minimalist. The fields are optional, with the possible exception of the 046. (The date is a core element for personal names, and the current NACO training materials state that the 046 is required, but since RDA is not written in terms of tags, one could argue that the date in the 1XX and/or 670 would meet the core requirement). A strict minimalist approach would forbid use of the fields, to save time in both cataloging and training, and be done with it. At Duke, we are a little removed from this. We say that the fields will not be covered in training, but catalogers independent in NACO may use them. This is a fuzzy line, because the fields are actually emphasized in the latest version of the training materials. But I, as NACO coordinator and trainer, use only the 046 and do not train in any of the others. As Mary Charles Lasater pointed out, the new fields can significantly increase training time for NACO, which was already substantial.
2. Moderate. This is well described in the British Library Guide to Name Authority Records
BL practice: balance must be struck between fullness and efficiency, when deciding what to record at the element level. Record only those elements that are:
Expedient to record
Readily ascertainable: only do the amount of research needed to identify the entity, and to create and justify unique authorised and variant access points. Only include in 046/3XX fields appropriate data that has been discovered in the course of this research. Do not do extra research in order to complete additional 046/3XX fields.
Useful: be selective in recording data in 046/3XX fields. For example, only record significant dates in 3XX |s and |t. In 373, only record institutions with which a person has a significant connection. In 372 and 374, only record significant fields of activity and occupations. Any of these fields may be omitted if useful data is not readily ascertainable.
Expedient to record: only search the LCSH file briefly, for suitable terms. If a specific term is not available, use a broader term. If no term is readily ascertainable in a quick search, omit the field. Make full use of Aleph short keys and drop down menus to insert elements into the authority record.
3. The sky’s the limit. I have a couple of examples of ARs with optional fields that go beyond the purposes of authority control as outlined by Richard Moore when he started this thread: n 2010043877 and no2011152077.
Discussion thus far indicates that which level a library chooses must be more an attempt to forecast the future than a reflection of current needs. Of course, the amount of resources available also comes into play. This is the main reason that Duke has taken the minimalist approach.
Mary Charles Lasater mentions:
"A few ‘best practices’ might be useful. "
With which I wholeheartedly agree.
Locally, I have established a few "best practices." For example, we put extra effort into authority records for certain categories, such as entities affiliated with the University and special collections. Hence, I would include our institution in a 373 field for faculty authors. We have a special collection of women composers, so I include a 374 field for them, in the hope that someday (soon?) this field will enrich searching.
But, of course, more widely agreed upon best practices would make a lot more sense. We are all about cooperation, aren't we?