The PREMIS Editorial Committee received a request to include an
optional "certainty attribute" in the Data Dictionary to indicate
the degree of certainty that the value provided for a particular
semantic unit is correct. Although the requester thought it might
be of use for all semantic units, the specific use case was in
reference to format:
I generate PREMIS documents from FITS (http://code.google.com/p/fits/).
FITS normalises and consolidates the output from various technical
metadata extraction tools. File formats are where there is the most
If multiple tools agree on format for a given file, and no tools
disagree, then it would be useful to indicate in PREMIS that there is a
high degree of certainty that this file format has been correctly
If only one tool is able to identify a file format, then there is a
lower degree of certainty. Both this situation and the one above will
produce a single PREMIS format element, but they have very different
degrees of certainty.
If there is disagreement amongst the tools as to the correct format for
a file, then there will be multiple PREMIS format elements. If all tools
but one have identified one format, and one tool another format, again,
it would be helpful to retain this information.
After discussion among EC members and staff at their institutions,
the general concern was that there are too many ways that degrees of
certainty can be expressed. A repository could use certainty
information consistently internally, but this would then be local
business information and not core preservation metadata. For
certainty information to be generally interoperable, use of a single
vocabulary for degrees of certainty would be required, and this
would be very difficult to devise. So, recognizing the importance
of certainty information about file formats and concerned about
interoperability, the EC is considering whether adding a certainty
element to pertain to format only, with only two values:
yes, this format is
no, there is uncertainty about the format
I am sending this note to see what PREMIS Implementers think
of this idea. Would this limited certainty information be useful,
or would you prefer to see more complex and/or more generally
appropriate certainty information allowed?
Please reply to the list, not to me directly. I'd love to get a
discussion going about this.