Ed and everyone, this is a response to the first question.
It’s interesting that near the end of the FRBR-LRM document, the authors admit that there are difficulties modelling the serial work in FRBR. As they say on page 67, “the ‘commonality of content’ that defines a serial resides in both the publisher’s and the editor’s intention to convey the feeling to end-users that all individual issues do belong to an identifiable whole, and in the collection of editorial concepts (a title, an overall topic, a recognizable layout, a regular frequency, etc.) that will help convey that feeling.” They also go on to say the same can be said of monographs (the 6th edition of Darwin’s On the Origin of Species does not contain quite the same concepts as the first edition).
What this suggests to me is that maybe we need to realize all Works are aspirational in a way. They come not from what some person had in mind in the past, but what the person intended. That has a necessary reference to the future. As the intention is fleshed out in Expressions and Manifestations, the content often changes.
To me, this suggests that the primary cataloging object has to remain the Manifestation. Though I realize that if we are not treating it as a Manifestation of a Work, it’s not really a Manifestation but an edition. I think I agree with Ed’s suggestion that Work and Expression records only need to be created when they are helpful.
I remember reading some years ago some documentation by VTLS that talked about how FRBR is particularly helpful for bound withs, something I encounter quite a bit as a special collections cataloger. I believe they said that if you have Work records, you only need to link them to your bibliographic record for the bound volume, and that will take care of the author and subject information. You don’t have to, particularly, agonize about how to describe the subject content of the whole volume, especially when the subjects of the bound works vary quite a bit. That certainly does seem to show the value of Work records. But once again, it seems like it’s mostly a matter of how to make cataloging work more economical, not really an insight into the real nature of information resources.
What do you think?
UAB Lister Hill Library
Thanks a lot for taking the time to answer my questions!
The following response is to all interested forum readers as well.
I was not questioning the methodology used in the research, given the enormity of OCLC’s database and how difficult it must have been to set parameters to cover all bibs. I was wondering, and I still am, however, if some data sectors, in spite of their inherent limitations (records coded as minimum level, less-than-full, record-unexamined, etc.), may have been inadvertently thrown in, undifferentiated from the top-level cataloged resources (coded DLC, pcc, etc.), as compelling evidence.
“Should a Work description be generated for every cataloged resource?” Well, in my mind, this question should trigger another more specifically defined question: by “every cataloged resource”, does it imply “every fully cataloged, up-to-the current LC/PCC standard resource”? If so, a huge amount of cataloged resources in OCLC should be re-examined individually by catalogers (no, not by software programs) and perhaps must be re-cataloged. Failing that, one could only surmise, if not laugh, at the notion how current or future resource description models would magically take care of such subpar products, cost-saving and all, and through data migration from MARC to [name a model here] would be able to integrate them into the world of linked data. If data are bad (or non-existent), how good could linked data be?
To illustrate, let me offer a set of stats of my own. A quick search on William Faulkner in our local database yielded 338 name/title entries. Out of these, only 117 are under some sort of authority control (clusters of undifferentiated editions, translations, compilations, etc.). The rest 216, an alarming 63%, all need authority work and/or need to be re-cataloged properly by today’s RDA/LC/PCC standard. Here are a few examples.
Name/titles for works or expressions:
Go down, Moses, and other stories (OCoLC26958382)
Stallion road : a screen play (OCoLC2032183)
Big woods (OCoLC283897)
Le hameau (The hamlet) (OcoLC4282473)
Schall und Wahn : Roman (OCoLC7300092)
Name/titles as subjects:
Faulkner, William, ‡d 1897-1962. ‡t Fable
Faulkner, William, ‡d 1897-1962. ‡t Hamlet
Faulkner, William, ‡d 1897-1962. ‡t Intruder in the dust
Faulkner, William, ‡d 1897-1962. ‡t Knight's gambit.
Faulkner, William, ‡d 1897-1962. ‡t Sanctuary
I am also curious to know if back then (70s,80s, 90s) it was an option to establish a name/time authority record when used as subject in 600. And when establishing it, one had to justify it in 670 like this (n 79128370):
100 1 Faulkner, William, ‡d 1897-1962. ‡t Sound and fury
670 Kałuża, I. The functioning of sentence structure in the stream-of-consciousness technique of William Faulkner's "The sound and the fury," 1979 (subj.)
All in all, I believe that every cataloged resource must have a work description, just as behind every single author there must be a distinct name authority record.
Your comments are most welcome!
Great question. Thanks!
The statistic about singletons in WorldCat's Work clusters is definitely startling. But the clustering algorithm is tuned to be conservative. In many cases, they represent records that can't be assigned with confidence to a bigger cluster just yet. Perhaps they can be in a future iteration, though.
To answer your question in more detail, I talked with my OCLC colleague Jenny Toves. She is the principal architect of OCLC's FRBR clustering algorithms.
According to Jenny, two distinct kinds of records are counted among the singletons.
Those records that describe genuinely unique items:
These records could, in theory, be described as FRBR Items in OCLC's Linked Data markup accessible from WorldCat.org using the modeling assumptions outlined in our contribution to the PCC Work draft white paper.
Those records containing noisy, sparse, degraded, or highly variable data.These records may or may not describe unique items, but do not generate matches with other records. The matching algorithm operates on raw MARC records and looks at authors, titles, and physical formats.
Authors that are authority-controlled can be matched with higher confidence than string-only data. You're right to point out that aggregates are problematic. The author-title match has some fuzziness to accommodate reasonable variability, but it would err if it clustered Death in Venice with Death in Venice and other stories. It's continually being updated to ensure that its output approximates human judgment.
As for RDA, we don't see any impact yet. The WorldCat Catalog catalog contains MARC records with a mixture of RDA and other encodings.
I hope this helps.
Senior Research Scientist
OCLC Membership and Research
“With some 50% of resources and 77% of work clusters in WorldCat identified as singletons ….”
It would be interesting to see how “singletons” are defined in the OCLC report.
1) Do they refer to bibliographic entities based on (non-conflicting) manifested work titles (i.e., 100/110/111+ title portion of 245 or, the title portion of a single 245)?
2) Are compilations included in this category (i.e., a trilogy, an exhibition catalog (with artist(s) in 700s), an aggregate of several novels either by a single author or by several authors, etc.)?
One would assume that most of these 50-70% resources were cataloged under AACRII. Would it be more helpful and instructive, for the sake of comparison and analysis, to compile a similar set of stats drawn from OCLC based on cataloging data after RDA was implemented, especially in the last two years?
The PCC SCS/LDAC Task Group on the Work Entity has been charged with producing a white paper to give a high-level outline of the issues surrounding the identification of work entities (PCC Vision, Mission, and Strategic Directions, 2015-2017, action 3.3). To that end, we are soliciting PCC community feedback on a number of questions relating to the Work entity. These questions, along with some background, will be presented one at a time over the next several weeks. Our goal is to trigger and lively and thoughtful discussion that will help us in our deliberations. There are no right or wrong answers to these questions. We are most interested in hearing well thought out arguments on either side.
The first question is this:
Should a Work description be generated for every cataloged resource?
As a top-down hierarchical model, FRBR requires a Work entity for every cataloged resource, functioning as the main clustering mechanism for relating Expressions and Manifestations, even when there is only a single Manifestation of a single Expression of a single Work (historically, the most common condition). However, in a more fluid and egalitarian Linked Data graph model, a Work description--analogous to a Work authority record--is not needed for every resource, even though elements of the FRBR Work entity, such as relationships with Creators and Subjects, will be part of the resource description. By its very nature, a graph model is intended to be flexible and therefore less accommodating to the hierarchical structure of FRBR than, e.g., XML. The linked data graph model allows relationships to be made among resources without a superimposed structure and enables related resource clusters to emerge more organically from the data.
Experiments by OCLC with legacy bibliographic data make a compelling data-based argument for generating FRBR Work entity graphs only when they are warranted. With some 50% of resources and 77% of work clusters in WorldCat identified as singletons, and in a community that is always seeking workflow economies, generating a work graph for every resource seems neither economical nor scalable in an RDF environment. From that perspective, it may be advisable to institute a best practice to create work graphs only when the resource in hand is not a singleton. However, whether this would be a decision made by the cataloger or by an automated background process would need to be determined. In addition, the actual stored data might be different from the catalog view. What is needed today is to explore the cataloging workflow and the user services that the catalog should provide as a way to provide a set of goals for the technical development.
In the FRBR conceptual model, attributes that are predictably shared among all the Manifestations of a Work belong to the Work description rather than the several Manifestation descriptions, such as Creators and Subjects. A best practice that does not require creation of a Work for singletons would involve a change to current cataloging practice. If there were no requirement to create a Work description for every resource, then the properties common to multiple Manifestations that are currently included in a Work description would have to be modeled for inclusion in Manifestation description as well.
We look forward to a lively discussion.
Chair, PCC SCS/LDAC Task Group on the Work Entity