Print

Print


On 09 January 2012 at 00:59 Kelley McGrath <[log in to unmask]> wrote:

>  In summary, I think we do need better provenance information and that
>  it should be
>
>  *more granular

 

Ideally, at the level of the single metadata statement (MARC tag/subfield, RDF triple, etc.), to support pick-and-mix selection from multiple sources in a global information environment.

>  *machine-interpretable

 

Only machines can operate fast enough to pick-and-mix on a global scale. One implication is that the (meta)data should be interoperable at global scale. One suggestion is that this can be supported by the existing infrastructure of the Web - i.e. that the RDF/linked data approach is appropriate for library metadata. 

>  *optional

 

In the sense that metadata without provenance retains some utility, and does not diminish the utility of related metadata. But professionally-generated memory institution metadata will be in a minority, competing with/complementing user- and machine-generated metadata about the same entities of interest. Any provenance information will be useful; it should be recorded/published as much as possible.  

>  *capable of recording alternate viewpoints and reconciling these
>  viewpoints by identifying preferred data

 

Data "preferred" by some cataloguing rule, presumably.  And a specific version of that rule; e.g. AACR editions. And perhaps a specific interpretation or option of the rule. I think we need to extend provenance information to include this. FRAD makes a start with the entity Rules, defined as "A set of instructions relating to the formulation and/or recording of controlled access points (authorized forms, variant forms or references, etc.)." This would need to be extended to cover the formulation and recording of all metadata statements. DCMI's Architecture Forum ( http://dublincore.org/groups/architecture/ ) is actively discussing development of the Dublin Core Abstract Model on which DC application profiles are based and which include accommodation for metadata content rules. 

>  *capable of recording a history of edits (which I didn't talk about
>  above but which I think would be useful)
>

 

This is a fundamental issue. In the RDF linked data community, one school of thought is that a triple is never changed or deleted. Instead, the triple is deprecated and a new triple created, if appropriate. Deprecation requires a "named graph" approach: the named triple needs a status property. It would also be useful to know who set the status to deprecated, and when it was done. This reinforces the need to supply provenance information at the statement/triple level; much current copy-cataloguing results at most in edits to only one or two statements.

 

So in more general terms, there is nothing to "edit". There is just a set of statements about a specific entity. Those statements may be contradictory, or say the same thing in different ways. The statements may have a professional, non-professional/user, or machine provenance, but that is not necessarily the cause of contradictions - professionals may change their minds (the "edit" scenario), disagree on content rule interpretation, use contradictory content rules, make mistakes, etc.

 

I think we need to consider this very seriously. We have a lot of latent provenance information for our metadata: at the collection (i.e. catalogue/finding-aid) level, including interaction with union catalogues and local institutional cataloguing practices (this last visibly disappearing as baby-boomers retire); and at the record level where systems store more detailed version history external to the record itself. It would be to our professional detriment (contradictions notwithstanding) if this information is lost or remains hidden. We need to supply all the provenance information we've got to promote our data in likely future information environments.

 

Cheers

 

Gordon