Print

Print


On 09 January 2012 at 00:59 Kelley McGrath <[log in to unmask]> wrote:

>  In summary, I think we do need better provenance information and that
>  it should be
>
>  *more granular 
Ideally, at the level of the single metadata statement (MARC tag/subfield, RDF
triple, etc.), to support pick-and-mix selection from multiple sources in a
global information environment.

>  *machine-interpretable 
Only machines can operate fast enough to pick-and-mix on a global scale. One
implication is that the (meta)data should be interoperable at global scale. One
suggestion is that this can be supported by the existing infrastructure of the
Web - i.e. that the RDF/linked data approach is appropriate for library
metadata. 

>  *optional 
In the sense that metadata without provenance retains some utility, and does not
diminish the utility of related metadata. But professionally-generated memory
institution metadata will be in a minority, competing with/complementing user-
and machine-generated metadata about the same entities of interest. Any
provenance information will be useful; it should be recorded/published as much
as possible.  

>  *capable of recording alternate viewpoints and reconciling these
>  viewpoints by identifying preferred data 
Data "preferred" by some cataloguing rule, presumably.  And a specific version
of that rule; e.g. AACR editions. And perhaps a specific interpretation or
option of the rule. I think we need to extend provenance information to include
this. FRAD makes a start with the entity Rules, defined as "A set of
instructions relating to the formulation and/or recording of controlled access
points (authorized forms, variant forms or references, etc.)." This would need
to be extended to cover the formulation and recording of all metadata
statements. DCMI's Architecture Forum
(http://dublincore.org/groups/architecture/) is actively discussing development
of the Dublin Core Abstract Model on which DC application profiles are based and
which include accommodation for metadata content rules. 

>  *capable of recording a history of edits (which I didn't talk about
>  above but which I think would be useful)
> 
This is a fundamental issue. In the RDF linked data community, one school of
thought is that a triple is never changed or deleted. Instead, the triple is
deprecated and a new triple created, if appropriate. Deprecation requires a
"named graph" approach: the named triple needs a status property. It would also
be useful to know who set the status to deprecated, and when it was done. This
reinforces the need to supply provenance information at the statement/triple
level; much current copy-cataloguing results at most in edits to only one or two
statements.
 
So in more general terms, there is nothing to "edit". There is just a set of
statements about a specific entity. Those statements may be contradictory, or
say the same thing in different ways. The statements may have a professional,
non-professional/user, or machine provenance, but that is not necessarily the
cause of contradictions - professionals may change their minds (the "edit"
scenario), disagree on content rule interpretation, use contradictory content
rules, make mistakes, etc.
 
I think we need to consider this very seriously. We have a lot of latent
provenance information for our metadata: at the collection (i.e.
catalogue/finding-aid) level, including interaction with union catalogues and
local institutional cataloguing practices (this last visibly disappearing as
baby-boomers retire); and at the record level where systems store more detailed
version history external to the record itself. It would be to our professional
detriment (contradictions notwithstanding) if this information is lost or
remains hidden. We need to supply all the provenance information we've got to
promote our data in likely future information environments.
 
Cheers
 
Gordon