LISTSERV mailing list manager LISTSERV 16.0

Help for BIBFRAME Archives


BIBFRAME Archives

BIBFRAME Archives


BIBFRAME@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

BIBFRAME Home

BIBFRAME Home

BIBFRAME  January 2012

BIBFRAME January 2012

Subject:

Thoughts on provenance

From:

Kelley McGrath <[log in to unmask]>

Reply-To:

Bibliographic Framework Transition Initiative Forum <[log in to unmask]>

Date:

Sun, 8 Jan 2012 16:59:27 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (234 lines)

 Since one of the things named graphs are supposed to help us with is 
 recording provenance, I thought I'd follow up my last post by sharing 
 some thoughts on provenance, too. I sometimes see provenance discussed 
 in terms of the provider of the data, e.g., the URL domain in linked 
 data. This is useful so far as it goes, but I am more interested in 
 provenance in terms of what justifies the data values given. Suppose 
 OCLC did release all of WorldCat as linked data. It's all very well to 
 know that some piece of information came from WorldCat, but frankly, the 
 quality of records varies tremendously within WC, from very, very good 
 to too minimal to be useful to totally wrong. So knowing that something 
 came from WC only gets you so far.

 There are various kinds of provenance built into existing library 
 cataloging, although these methods were built for the analog era and 
 tend to not be machine actionable. IMO they are also less adequate the 
 farther you get from the original model of books with title pages. You 
 could think of these mechanisms as the library world's equivalent of 
 Wikipedia's demand for citations. They provide a way that someone else 
 coming along later can reproduce what the first cataloger did to come to 
 their conclusions--a trust but verify approach.

 One of the underpinnings of cooperative cataloging is that if another 
 cataloger (or user) comes along and looks at the bibliographic record 
 you've created, you've put in enough information in a way that the 
 second person can tell whether or not s/he has the same item (the FRBR 
 identify task). One of the reasons catalogers are made so uncomfortable 
 by ultra brief vendor records (the infamous level 3 records in OCLC 
 WorldCat) is that they violate this community norm in the extreme. In 
 some of these vendor records, nothing is right except the identifier 
 (usually ISBN); the title, creator, format, and publication information 
 are all different from what's on the item.

 When creating a bibliographic record describing a book, CD or whatever, 
 the cataloging rules tell you to base the description on the item in 
 hand. This means that the item itself is considered the most 
 authoritative source for the information in the body of the description. 
 However, there might be more than one possible source for a given piece 
 of information in the item. For example, there may be more than one form 
 of title in different places on the item (title page, cover, running 
 title, etc.). The approach of existing cataloging rules is to give a 
 hierarchy of sources (in AACR2 chief and prescribed sources and in RDA 
 preferred sources). In AACR2, if you take the title from the chief 
 source (say the title page), you don't have to say what you did and 
 everyone assumes that's where the title came from (as an aside, let me 
 say that I have come to hate implicit data like this). If you take the 
 main title from somewhere else, you are supposed to say so in a note 
 (e.g., "Cover title."). The source that you use for other basic citation 
 information is then assumed to be the same as that for the title or, for 
 some data elements, one of a list of other possible options. If you take 
 citation data from somewhere else, you bracket it, but the source of 
 data for everything outside the title is generally not noted.

 Leaving aside the lack of machine-friendliness, this also doesn't work 
 very well for a lot of non-book media. If someone took the title from 
 the title page of a book, you can assume that they took the rest of 
 their basic descriptive info from there, too, or from looking at the 
 item itself (such as the number of pages). If you have a DVD video, even 
 if someone takes a title from the title frames (chief source), you can't 
 make any assumptions about where they got the rest of their data. Much 
 of the publication info (publisher, date, series) is usually best taken 
 from the disc label or container. Beyond the basic descriptive info, did 
 the cataloger take the soundtrack, subtitle or caption options from the 
 container (known to be wrong on occasion), the disc menu (also known to 
 be wrong occasionally) or from listening to or looking at the tracks 
 (only works if you recognize the language).

 Also, if a cataloger takes a DVD title from somewhere other than the 
 title frames, such as the disc label, it could mean one of two 
 significantly different things:

 1) there is no title on the title frames

 2) there is a title on the title frames, but the cataloger didn't look 
 at it due to economical, technological, etc. limitations.

 It would also be useful in some circumstances to record contradictory 
 information in combination with sources. One that comes up with DVDs 
 more often than you would think is the case where the packaging makes no 
 mention of closed captions, but if you pop the DVD in a player, it is 
 captioned.

 Right now, a cataloger would just make the usual note:
   Closed-captioned

 If you could say:
   Container/Packaging: no closed captions
   Validated in player: closed captions

 Someone who has this DVD that doesn't say anything externally about 
 captions would know that they probably do have a captioned DVD whereas 
 in the current system, they're likely to think they have a different 
 version. It would also be good to be able to mark which is thought to be 
 the true statement.

 When OLAC was working on our initial investigation of using FRBR to 
 improve access to moving images (see 
 http://www.olacinc.org/drupal/?q=node/27, particularly part 3a), we 
 thought that it was best to allow for element-specific provenance 
 without requiring it. We were focusing on FRBR works where the 
 item-in-hand is not necessarily the authoritative source so provenance 
 is clearly important. On the other hand, recording provenance at a 
 granular level makes for additional work. By allowing everything to have 
 a value of unspecified/unknown for provenance, it allowed us to have 
 granularity when possible while allowing for legacy data and data from 
 providers who choose not to provide that level of granularity.

 We also played around with a value for inferred/guessed for those 
 situations where the evidence clearly seems to be pointing at something, 
 but there isn't enough solid evidence to make a strong assertion.

 By clearly identifying elements of unknown or unreliable provenance, it 
 is easy for those who care to update the information while other can use 
 the information as is.

 Right now we have provenance in bib records in the following forms that 
 I can think of:
 * Presence or absence of 500 source of title note in conjunction with 
 format of item for the source of citation/transcribed parts of record
 * 040 field lists institutional codes of libraries that have edited a 
 record at the record level (so you know the last institution that 
 touched a record but you don't know what they did)

 What I might wish for is something more granular and 
 machine-actionable, such as

 Three optional machine-comprehensible provenance elements attached to 
 every data element:
 1) source of the data
 2) the institution entering the data
 3) date the data was input

 Perhaps something like the following for title proper:
 TitleProper: Citizen Kane
 DataSource: title frame
 DataInst: OrU
 DataDate: 2012-01-04

 Even if catalogers only did this the source of data for title proper, 
 it would give us as much information as we have today, but in a 
 machine-friendly form (although it might be hard to come up with a long 
 enough list of data sources). The editing institution and date could 
 presumably be generated automatically.

 RDA is making a practical move away from identifying the sources of 
 data even as clearly as in AACR2.  In AACR2, if you include descriptive 
 citation data that was taken from somewhere other than your 
 selected/allowed sources, you bracket it. So basically AACR2 has a 
 binary partition into data from the chief or prescribed source and data 
 not from there. Except for title proper, RDA generally allows data from 
 other sources to be silently interpolated. This does suggest that we 
 could use a way to representing provenance for data elements that 
 contain data from more than one source.

 RDA retains the notion of a "source of title" note, but in a way that 
 ironically undermines its usefulness. In RDA

 1. As in AACR2 (at least for books and moving images), the note is only 
 given when the data is taken from somewhere other than source at the top 
 of the hierarchy for preferred sources for a format

 2. The note is optional

 Given these two possibilities, if there's no note, how will anyone ever 
 know which situation applies? IMO, it would be much better to give an 
 option for a positive source of title note or element across the board 
 and allow those who value that information to always record it 
 explicitly.


 What about authority records? Authority records are records for things 
 other than bibliographic entities (people, corporate bodies, subjects) 
 or for some bibliographic entities (usually those other than FRBR 
 manifestations, which are described by bibliographic records). If 
 anything, provenance is even more important for authority records. 
 Although the related item-in-hand is taken into account when 
 constructing these, it is commonly necessary to consult external 
 information. If we start creating separate records for FRBR works and 
 expressions, these will be more like authority records in that they 
 can't necessarily rely on an item and will often have to justify their 
 data with citations for external sources.

 Like provenance in bibliographic records, provenance in current 
 authority records is largely recorded in free text (albeit structured) 
 notes.

 The most common one is 670 (source data found), which includes a 
 citation plus usually the data found and the location where the data was 
 found within the cited material.

 670 $a Its Guide to manuscripts in the Bentley Historical Library, 
 1976: $b t.p. (Bentley Historical Library, Michigan Historical 
 Collections, Univ. of Mich.)

 For internet resources, the date when the site was looked at is 
 included:

 670 $a Internet Movie Database, Feb. 6, 2003: $b (b. 30 July 1961 in 
 Augusta, Ga.; sometimes credited as: Laurence Fishburne III, Lawrence 
 Fishburne III; changed his name from Larry to Laurence in his films in 
 1991)

 This is important for things like birth dates in IMDb, which can be 
 something of a moving target.

 There is a parallel field 675 for sources where data was not found. 
 This is generally only used for prominent sources where a reasonable 
 person would be expected to look. Standardized abbreviations for 
 well-known sources are often used. For a composer, you might have:

 675  $a New Grove; $a Thompson, 10th ed.

 There is now also a $v in some authority record fields for a source of 
 information for a specific data element. I think this is supposed to 
 contain a citation, although I wasn't able to find an example in the 
 documentation. This is more granular, but it's not clear to me if it's 
 more machine-interpretable.

 In summary, I think we do need better provenance information and that 
 it should be

 *more granular
 *machine-interpretable
 *optional
 *capable of recording alternate viewpoints and reconciling these 
 viewpoints by identifying preferred data
 *capable of recording a history of edits (which I didn't talk about 
 above but which I think would be useful)

 Kelley
 
 Kelley McGrath
 University of Oregon
 [log in to unmask]

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
July 2011
June 2011

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager