Print

Print


The PREMIS Editorial Committee received a request to include an optional 
"certainty attribute" in the Data Dictionary to indicate the degree of 
certainty that the value provided for a particular semantic unit is 
correct.  Although the requester thought it might be of use for all 
semantic units, the specific use case was in reference to format:

 ===================================================================================================

I generate PREMIS documents from FITS (http://code.google.com/p/fits/).
FITS normalises and consolidates the output from various technical
metadata extraction tools. File formats are where there is the most
difficulty.

If multiple tools agree on format for a given file, and no tools
disagree, then it would be useful to indicate in PREMIS that there is a
high degree of certainty that this file format has been correctly
identified.

If only one tool is able to identify a file format, then there is a
lower degree of certainty. Both this situation and the one above will
produce a single PREMIS format element, but they have very different
degrees of certainty.

If there is disagreement amongst the tools as to the correct format for
a file, then there will be multiple PREMIS format elements. If all tools
but one have identified one format, and one tool another format, again,
it would be helpful to retain this information.

 =====================================================================================================

After discussion among EC members and staff at their institutions, the 
general concern was that there are too many ways that degrees of 
certainty can be expressed.  A repository could use certainty 
information consistently internally, but this would then be local 
business information and not core preservation metadata.  For certainty 
information to be generally interoperable, use of a single vocabulary 
for degrees of certainty would be required, and this would be very 
difficult to devise.  So, recognizing the importance of certainty 
information about file formats and concerned about interoperability, the 
EC is considering whether adding a certainty element to pertain to 
format only, with only two values:

/yes, this format is certain/
/no, there is uncertainty about the format
/
  I am sending this note to see what PREMIS Implementers think of this 
idea.    Would this limited certainty information be useful, or would 
you prefer to see more complex and/or more generally appropriate 
certainty information allowed?

Please reply to the list, not to me directly.  I'd love to get a 
discussion going about this.

Priscilla