Thanks for lots of interesting discussion and points made in the recent interchanges. Reading them, however, led me to realize that everything may not be as clear as we like to think it is as we work on the vocabulary and transformation - in a very exposed way - on bibframe.org. So the following is a short description of the model we are basically using to transform MARC data. It does not go into the details or nuances of the transformations but I hope conveys the general picture. We will post this description on the Bibframe web site and, since the model changes as we learn more and the conversation progresses, we will update it from time to time. Again, it is intended to give the general not the detail view.
Overview of current MARC to BIBFRAME model used by LC
Bibframe.org is used as a site where the Library of Congress can post material related to the BIBFRAME initiative. There are two key items there, a draft vocabulary for BIBFRAME and a transformation service and downloadable transformation software. The two parts (vocabulary and transformations) are separate although we have been trying to get them into synch and would hope to keep them that way if possible. The purpose of the VOCAB is to help develop a BIBFRAME ontology. The transformations are built to take MARC data and recraft it into BIBFRAME data. This exercise is useful as the MARC gives us a large fund of data with which to experiment - and realizing also that ultimately we will need to "somehow" move that vast number of MARC records into the new environment. It is recognized that going forward the data will be created directly into a BIBFRAME related pattern so this conversion will not be central to the new environment.
Using the MARC data as a data source for experimentation brings its own joys and problems, however, as the data was created with different rules and different local data conventions. As a result the transformation is complex in some cases.
The current vocabulary is strongly RDA flavored, as names of elements were largely aligned with RDA rather than MARC. There are names that are more general than RDA, however, or reflect other conventions. The vocabulary will and must evolve, especially as other rule models are integrated, but RDA proved an excellent place to start.
BIBFRAME Model in VOCAB
The data that is embodied in MARC records is broad, rich, and detailed. The main MARC formatted data that the transformation treats is bibliographic, authority, and holdings. Following is the manner in which the data is currently being reformed into the BIBFRAME model which is characterized by BIBFRAME Works, Instances, Authorities, and Annotations. Note that for convenience the word "record" has been used for the cluster of information or graph concerning one concept.
The MARC Authority records for names and subjects are treated as BIBFRAME Authorities. LC had previously converted its MARC Authority records for names and subjects to RDF and made them available in various markups (MADS, MARC, SKOS, etc.) and syntaxes (RDF, XML, etc.) through the LC Linked Data Service. The current transformation references these records and has not established a separate BIBFRAME vocabulary for them. The exact mature of a BIBFRAME Authority is a topic now under discussion so the above is a simplification of the concept.
The traditional MARC Authority records for titles and name/titles have been converted into BIBFRAME Work records as they are conceptually data for works. They therefore form the nucleus of the BIBFRAME Work dataset. Since traditionally title and name/title MARC Authority records were made for expressions, the nuclear work dataset contains records for RDA Works and RDA Expressions. When the relationship between a work and an expression can be made programmatically, properties have been added to these BIBFRAME Work records indicating RDA Work to RDA Expression relationships.
The nuclear set of BIBFRAME Work records are then augmented with the subject information found in the MARC Bibliographic records that carry those work titles in their 130/240 uniform title fields. In addition class numbers (but not call numbers) are carried over as additional subject information and the content type is derived from MARC Leader positions and the 336 fields. Since MARC had several places where content type may be recorded, there may be several content type indications in the BIBFRAME Work record. If the new BIBFRAME Work record contains certain elements in the heading string (e.g., language or arr. subfields) the records are identified by the programs as RDA Expression records so several additional elements native to RDA Expressions are also included in the BIBFRAME Work record, in addition to the links indicated above.
So basically these BIBFRAME Work records should contain the key elements of RDA Works and RDA Expressions. They will also contain title variations and title related attributes which were brought over from the MARC Authority record for the title or name/title.
But the BIBFRAME Work records created from the MARC Authority dataset are only a small percent of the works that a bibliographic dataset typically contains. It has been estimated that title and name/title authority records are made for only an estimated 5-20 % of the MARC Bibliographic records in a library collection. The rest of the MARC Bibliographic records are generally for items only published once and in traditional cataloging, title and name/title authority records were not made for them.
Thus the next step is to create BIBFRAME Work records for the remaining 80-95% of the MARC Bibliographic records. The 245 title for those items are taken for the BIBFRAME Work "derived uniform" title, the subjects and class numbers are included in the BIBFRAME Work record, and other work related information are gathered to form a BIBFRAME Work record for the item. These items would not have expression records or they would have had a title or name/title authority created and already have been processed.
Most of the relationships that are in MARC Bibliographic records in the 700-730 fields, 760-788 fields, and a few notes are work to work relationships and they are included in the BIBFRAME Work records.
At this point the BIBFRAME Work dataset contains a complete set of RDA Work records for the whole MARC Bibliographic dataset plus RDA Expression records for those items for which an expression record is needed.
The BIBFRAME Instance dataset is generated from the MARC bibliographic records. Every MARC bibliographic record creates one or more BIBFRAME Instance records. The program uses several clues to generate more than one instance record from one MARC bibliographic in certain situations, e.g., some cases where multiple ISBNs are present in the bibliographic record, or cases where there is an indication that a microform and a print or an electronic and a print are represented on the same bibliographic record. The Instance records are then generated for all of the identified Instances and include the elements that have been identified generally for RDA Manifestations such as physical description information, provider information, etc. They have "on piece" titles (from the 245) but link to the BIBFRAME Work dataset for the "uniform" titles and the subject and other RDA work or expression information.
BIBFRAME Annotations and Item/Holding Information
The current plan is to treat item/holding as BIBFRAME Annotations. This is because the model indicates that information that is related to one's own held item and not to other representations of the instance would be considered annotations that one attaches to an instance. This is still being debated as BIBFRAME Annotations are better understood. However, currently, we are in the process of attaching a few data points concerning acquisition, access, and holdings (including finally the call number or shelf location) to the BIBFRAME Instance record.
We are also working with several other elements that are in the current MARC record but might be considered BIBFRAME Annotations, including book reviews, physical condition, preservation plans, cover art, and descriptions of the content produced by the publisher. This is an area to watch and debate as we work with the model.
In summary, the following is a very general outline of how we are attempting to reposition MARC data in the BIBFRAME model, while also accommodating RDA and other cataloging conventions. All who deal with bibliographic data that was encoded in to the MARC format from various rule and card bases, and who have seen and even participated in the addition of elements to MARC that make it both rich and complex know that these translations are not exact, nor will they ever be. They can help to define the road forward so that new descriptions that go straight into systems that support the use of BIBFRAME for exchange may have advanced functionality based on the emphasis on relationships in the Framework.
- name and subject attributes
- work title for every work (uniform and "derived uniform")
- subject access elements (vocabularies like LCSH and MESH)
- subject access elements (classification part of call numbers)
- content type
- language and other expression information
- manifestation title
- description information
- provider information
- holdings/item Annotation
What you may see when transforming MARC records
Single MARC Bibliographic record submitted for transformation:
Transformation is essentially field by field. Names and subjects become BIBFRAME Authorities that do not presently point to ID.loc.gov but instead contain the text strings. BIBFRAME Work records are created from the data in the MARC 130, 240, or 245 fields. Data in the MARC record that describes another work (primarily in the 700-730 with a $t and the 76X-78X fields) are converted to a separate "stub" work records and relationship properties relate the works. Thus the result may be one BIBFRAME Work record (no relationships present) or several.
BIBFRAME Instance records are created and linked to the primary BIBFRAME Work record. Multiple Instance records are created if the submitted MARC record indicates it stands for more than one carrier (according to our algorithm). Instance records are only created for the stub work records when there is enough information in the MARC record to make a "stub" Instance.
Set of MARC Bibliographic records submitted for transformation:
Each record is processed as above but matching and deduping of the works in the new dataset is not done - yet. This makes a dataset that the experimenter may dedup, load, add URIs, etc.
Sally H. McCallum
Chief, Network Development and Standards Office
Library of Congress, 101 Independence Ave., SE
Washington, DC 20540 USA
Tel. 1-202-707-5119 -- Fax 1-202-707-0115