LISTSERV mailing list manager LISTSERV 16.0

Help for BIBFRAME Archives


BIBFRAME Archives

BIBFRAME Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

BIBFRAME Home

BIBFRAME Home

BIBFRAME  January 2012

BIBFRAME January 2012

Subject:

Thoughts on provenance

From:

Kelley McGrath <[log in to unmask]>

Reply-To:

Bibliographic Framework Transition Initiative Forum <[log in to unmask]>

Date:

Sun, 8 Jan 2012 16:59:27 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (234 lines)

 Since one of the things named graphs are supposed to help us with is
 recording provenance, I thought I'd follow up my last post by sharing
 some thoughts on provenance, too. I sometimes see provenance discussed
 in terms of the provider of the data, e.g., the URL domain in linked
 data. This is useful so far as it goes, but I am more interested in
 provenance in terms of what justifies the data values given. Suppose
 OCLC did release all of WorldCat as linked data. It's all very well to
 know that some piece of information came from WorldCat, but frankly, the
 quality of records varies tremendously within WC, from very, very good
 to too minimal to be useful to totally wrong. So knowing that something
 came from WC only gets you so far.

 There are various kinds of provenance built into existing library
 cataloging, although these methods were built for the analog era and
 tend to not be machine actionable. IMO they are also less adequate the
 farther you get from the original model of books with title pages. You
 could think of these mechanisms as the library world's equivalent of
 Wikipedia's demand for citations. They provide a way that someone else
 coming along later can reproduce what the first cataloger did to come to
 their conclusions--a trust but verify approach.

 One of the underpinnings of cooperative cataloging is that if another
 cataloger (or user) comes along and looks at the bibliographic record
 you've created, you've put in enough information in a way that the
 second person can tell whether or not s/he has the same item (the FRBR
 identify task). One of the reasons catalogers are made so uncomfortable
 by ultra brief vendor records (the infamous level 3 records in OCLC
 WorldCat) is that they violate this community norm in the extreme. In
 some of these vendor records, nothing is right except the identifier
 (usually ISBN); the title, creator, format, and publication information
 are all different from what's on the item.

 When creating a bibliographic record describing a book, CD or whatever,
 the cataloging rules tell you to base the description on the item in
 hand. This means that the item itself is considered the most
 authoritative source for the information in the body of the description.
 However, there might be more than one possible source for a given piece
 of information in the item. For example, there may be more than one form
 of title in different places on the item (title page, cover, running
 title, etc.). The approach of existing cataloging rules is to give a
 hierarchy of sources (in AACR2 chief and prescribed sources and in RDA
 preferred sources). In AACR2, if you take the title from the chief
 source (say the title page), you don't have to say what you did and
 everyone assumes that's where the title came from (as an aside, let me
 say that I have come to hate implicit data like this). If you take the
 main title from somewhere else, you are supposed to say so in a note
 (e.g., "Cover title."). The source that you use for other basic citation
 information is then assumed to be the same as that for the title or, for
 some data elements, one of a list of other possible options. If you take
 citation data from somewhere else, you bracket it, but the source of
 data for everything outside the title is generally not noted.

 Leaving aside the lack of machine-friendliness, this also doesn't work
 very well for a lot of non-book media. If someone took the title from
 the title page of a book, you can assume that they took the rest of
 their basic descriptive info from there, too, or from looking at the
 item itself (such as the number of pages). If you have a DVD video, even
 if someone takes a title from the title frames (chief source), you can't
 make any assumptions about where they got the rest of their data. Much
 of the publication info (publisher, date, series) is usually best taken
 from the disc label or container. Beyond the basic descriptive info, did
 the cataloger take the soundtrack, subtitle or caption options from the
 container (known to be wrong on occasion), the disc menu (also known to
 be wrong occasionally) or from listening to or looking at the tracks
 (only works if you recognize the language).

 Also, if a cataloger takes a DVD title from somewhere other than the
 title frames, such as the disc label, it could mean one of two
 significantly different things:

 1) there is no title on the title frames

 2) there is a title on the title frames, but the cataloger didn't look
 at it due to economical, technological, etc. limitations.

 It would also be useful in some circumstances to record contradictory
 information in combination with sources. One that comes up with DVDs
 more often than you would think is the case where the packaging makes no
 mention of closed captions, but if you pop the DVD in a player, it is
 captioned.

 Right now, a cataloger would just make the usual note:
   Closed-captioned

 If you could say:
   Container/Packaging: no closed captions
   Validated in player: closed captions

 Someone who has this DVD that doesn't say anything externally about
 captions would know that they probably do have a captioned DVD whereas
 in the current system, they're likely to think they have a different
 version. It would also be good to be able to mark which is thought to be
 the true statement.

 When OLAC was working on our initial investigation of using FRBR to
 improve access to moving images (see
 http://www.olacinc.org/drupal/?q=node/27, particularly part 3a), we
 thought that it was best to allow for element-specific provenance
 without requiring it. We were focusing on FRBR works where the
 item-in-hand is not necessarily the authoritative source so provenance
 is clearly important. On the other hand, recording provenance at a
 granular level makes for additional work. By allowing everything to have
 a value of unspecified/unknown for provenance, it allowed us to have
 granularity when possible while allowing for legacy data and data from
 providers who choose not to provide that level of granularity.

 We also played around with a value for inferred/guessed for those
 situations where the evidence clearly seems to be pointing at something,
 but there isn't enough solid evidence to make a strong assertion.

 By clearly identifying elements of unknown or unreliable provenance, it
 is easy for those who care to update the information while other can use
 the information as is.

 Right now we have provenance in bib records in the following forms that
 I can think of:
 * Presence or absence of 500 source of title note in conjunction with
 format of item for the source of citation/transcribed parts of record
 * 040 field lists institutional codes of libraries that have edited a
 record at the record level (so you know the last institution that
 touched a record but you don't know what they did)

 What I might wish for is something more granular and
 machine-actionable, such as

 Three optional machine-comprehensible provenance elements attached to
 every data element:
 1) source of the data
 2) the institution entering the data
 3) date the data was input

 Perhaps something like the following for title proper:
 TitleProper: Citizen Kane
 DataSource: title frame
 DataInst: OrU
 DataDate: 2012-01-04

 Even if catalogers only did this the source of data for title proper,
 it would give us as much information as we have today, but in a
 machine-friendly form (although it might be hard to come up with a long
 enough list of data sources). The editing institution and date could
 presumably be generated automatically.

 RDA is making a practical move away from identifying the sources of
 data even as clearly as in AACR2. In AACR2, if you include descriptive
 citation data that was taken from somewhere other than your
 selected/allowed sources, you bracket it. So basically AACR2 has a
 binary partition into data from the chief or prescribed source and data
 not from there. Except for title proper, RDA generally allows data from
 other sources to be silently interpolated. This does suggest that we
 could use a way to representing provenance for data elements that
 contain data from more than one source.

 RDA retains the notion of a "source of title" note, but in a way that
 ironically undermines its usefulness. In RDA

 1. As in AACR2 (at least for books and moving images), the note is only
 given when the data is taken from somewhere other than source at the top
 of the hierarchy for preferred sources for a format

 2. The note is optional

 Given these two possibilities, if there's no note, how will anyone ever
 know which situation applies? IMO, it would be much better to give an
 option for a positive source of title note or element across the board
 and allow those who value that information to always record it
 explicitly.


 What about authority records? Authority records are records for things
 other than bibliographic entities (people, corporate bodies, subjects)
 or for some bibliographic entities (usually those other than FRBR
 manifestations, which are described by bibliographic records). If
 anything, provenance is even more important for authority records.
 Although the related item-in-hand is taken into account when
 constructing these, it is commonly necessary to consult external
 information. If we start creating separate records for FRBR works and
 expressions, these will be more like authority records in that they
 can't necessarily rely on an item and will often have to justify their
 data with citations for external sources.

 Like provenance in bibliographic records, provenance in current
 authority records is largely recorded in free text (albeit structured)
 notes.

 The most common one is 670 (source data found), which includes a
 citation plus usually the data found and the location where the data was
 found within the cited material.

 670 $a Its Guide to manuscripts in the Bentley Historical Library,
 1976: $b t.p. (Bentley Historical Library, Michigan Historical
 Collections, Univ. of Mich.)

 For internet resources, the date when the site was looked at is
 included:

 670 $a Internet Movie Database, Feb. 6, 2003: $b (b. 30 July 1961 in
 Augusta, Ga.; sometimes credited as: Laurence Fishburne III, Lawrence
 Fishburne III; changed his name from Larry to Laurence in his films in
 1991)

 This is important for things like birth dates in IMDb, which can be
 something of a moving target.

 There is a parallel field 675 for sources where data was not found.
 This is generally only used for prominent sources where a reasonable
 person would be expected to look. Standardized abbreviations for
 well-known sources are often used. For a composer, you might have:

 675 $a New Grove; $a Thompson, 10th ed.

 There is now also a $v in some authority record fields for a source of
 information for a specific data element. I think this is supposed to
 contain a citation, although I wasn't able to find an example in the
 documentation. This is more granular, but it's not clear to me if it's
 more machine-interpretable.

 In summary, I think we do need better provenance information and that
 it should be

 *more granular
 *machine-interpretable
 *optional
 *capable of recording alternate viewpoints and reconciling these
 viewpoints by identifying preferred data
 *capable of recording a history of edits (which I didn't talk about
 above but which I think would be useful)

 Kelley
 
 Kelley McGrath
 University of Oregon
 [log in to unmask]

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
July 2011
June 2011

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager