Print

Print


The PREMIS Editorial Committee received a request to include an optional "certainty attribute" in the Data Dictionary to indicate the degree of certainty that the value provided for a particular semantic unit is correct.  Although the requester thought it might be of use for all semantic units, the specific use case was in reference to format:

===================================================================================================
I generate PREMIS documents from FITS (http://code.google.com/p/fits/). 
FITS normalises and consolidates the output from various technical
metadata extraction tools. File formats are where there is the most
difficulty.

If multiple tools agree on format for a given file, and no tools
disagree, then it would be useful to indicate in PREMIS that there is a
high degree of certainty that this file format has been correctly
identified.

If only one tool is able to identify a file format, then there is a
lower degree of certainty. Both this situation and the one above will
produce a single PREMIS format element, but they have very different
degrees of certainty.

If there is disagreement amongst the tools as to the correct format for
a file, then there will be multiple PREMIS format elements. If all tools
but one have identified one format, and one tool another format, again,
it would be helpful to retain this information.
=====================================================================================================

After discussion among EC members and staff at their institutions, the general concern was that there are too many ways that degrees of certainty can be expressed.  A repository could use certainty information consistently internally, but this would then be local business information and not core preservation metadata.  For certainty information to be generally interoperable, use of a single vocabulary for degrees of certainty would be required, and this would be very difficult to devise.  So, recognizing the importance of certainty information about file formats and concerned about interoperability, the EC is considering whether adding a certainty element to pertain to format only, with only two values:

yes, this format is certain
no, there is uncertainty about the format

 I am sending this note to see what PREMIS Implementers think of this idea.    Would this limited certainty information be useful, or would you prefer to see more complex and/or more generally appropriate certainty information allowed?

Please reply to the list, not to me directly.  I'd love to get a discussion going about this.

Priscilla