I'm not sure that we can judge provenance on end-user needs of the
usual stripe. I see provenance as a need for systems so that we don't
produce a gawd-awful mess that doesn't serve users at all.
The main provenance need that I am aware of is that of managing
updates to and interaction with data. Given that the world we are
moving into will be more interactive than today's catalog, that the
division between 'cataloger' and 'user' will be less strict because
users will be able to do things with data, even adding or changing,
then you need provenance as a way to identify library data on the Web.
Changes in this environment will be of the form of adding another
version of something, or linking some previously unlinked things
together. Without provenance/versioning this results in gobbly-de
gook. In the Semantic Web sense, provenance goes hand-in-hand with
versioning, and you do want to be able to know if you have the latest
version of something. Provenance is what will make library data
library data, and user data user data, and Roy's data Roy's data.
Or maybe we aren't talking about the same thing?
Quoting Roy Tennant <[log in to unmask]>:
> Diane asks the question ³We want to do this well, donıt we?² My reply would
> be we should want to do it as well as is required to support real end-user
> needs that are important to support. This is because we will clearly lack
> the level of resourcing we enjoyed for much of the 80s and 90s, and even
> into the 2000s. We must choose well where to put our resources or we will
> regret it. Lacking any context, any cataloger will want to describe a
> resource to within an inch of its life. But that isnıt what we can afford to
> So Iım suggesting we need to provide the end-user use cases where knowing
> ³where it came from, when it was last updated, how it was created (human or
> machine?)² is important and then we can go from there. This can be something
> along the lines of ³without that information I canıt provide the user with a
> display from which they can make intelligent decisions about the resource
> because of X and Y². But there must be something to justify all the work
> besides our deep-seated (and laudible) desire to do things ³well².
> On 1/11/12 1/11/12 12:18 PM, "Diane Hillmann" <[log in to unmask]>
>> I've sure been there, too, wishing there were good ways to figure
>> out who did
>> what in a MARC record!
>> I certainly disagree with Roy very strongly--provenance is one of the things
>> we're really REALLY going to need as we move to an environment
>> where we'll be
>> managing data at the statement level collected from many places.
>> This was the
>> sort of thing I learned to do when I was working in the NSDL
>> project, and for
>> this librarian, it was a complete different way of looking at data
>> (but pretty nifty, too).
>> What I learned from that experience is that, when you're going to be doing
>> something with this data (not just displaying it to people looking at
>> catalogs), you need to know where it came from, when it was last
>> updated, how
>> it was created (human or machine?), etc. Management of data at the statement
>> level (which for those of you attending ALA Midwinter, I'll be talking about
>> at the Cataloging Norms IG, at 10:30-noon on Saturday) isn't rocket science,
>> but it is quite different from the closed world of library data, and
>> definitely requires provenance information to do well.
>> We want to do this well, don't we?
>> On Wed, Jan 11, 2012 at 2:35 PM, Kevin M Randall
>> <[log in to unmask]> wrote:
>>> Roy Tennant wrote:
>>>> > In all of my 37 years working in libraries I've never
>>>> > encountered a situation where it was necessary to know where the title
>>>> > came from to do useful work with bibliographic data. In what
>>>> situations is
>>>> > necessary, and why?
>>> Okay, it looks like we've got two different meanings of
>>> "provenance" going on
>>> in this thread. I think Kelley McGrath started out talking about
>>> "provenance" meaning WHO CREATED the metadata. Because some of the message
>>> talked about sources of data on the resource, this got morphed into a
>>> discussion also about WHERE THE DATA APPEARED ON THE RESOURCE.
>>> That being said, I think that *both* things are useful. I would consider
>>> myself quite blessed if I were able to say that I've never needed to have
>>> this information through my entire career. If we're talking about
>>> creator of
>>> the metadata, that would be very, very useful in so many situations. In a
>>> MARC record, when there is more than one institution identified in the 040
>>> field, there are many times I have needed to know, for example,
>>> which library
>>> changed to serial from active to ceased, or which library added a note or
>>> added entry--at the very least, so I could contact that library
>>> and determine
>>> if something I have in hand is really the same thing as what the other
>>> cataloger saw. And if we're talking about where on the resource the data
>>> appears, that is also helpful, especially with resources having the same or
>>> similar titles, and/or bearing multiple publisher/issuing body names.
>>> And in regard to the idea that we should "carry forward only what can be
>>> justified by real requirements from real users", I would certainly
>>> hope that
>>> we keep in mind that people who create, manipulate, and manage metadata ARE
>>> "real users"!
>>> Kevin M. Randall
>>> Principal Serials Cataloger
>>> Bibliographic Services Dept.
>>> Northwestern University Library
>>> 1970 Campus Drive
>>> Evanston, IL 60208-2300
>>> email: [log in to unmask]
>>> phone: (847) 491-2939 <tel:%28847%29%20491-2939>
>>> fax: (847) 491-4345 <tel:%28847%29%20491-4345>
[log in to unmask] http://kcoyle.net