New Year Greetings from New Zealand,
We been working on a data model that will allow us to retain some technical provenance data inside our object level PREMIS records, and have built something of a proposal model for discussion.
The aim is to create a data object that records any required 'changes' made to files (made with reference to our pre-conditioning policy, which describes the constraints on authorised changes - namely that they are (i) lossless and (ii) revertible). see attached), and can be used to systematically return objects to their pre-change state if required. Primary considerations are built around recording authorised changes through recording fixity entries, and pre/post change values. We expect to be able to make these changes via automated tools in the future.
In brief, we taken the structure used to represent a PEMIS EventNote, changed the data labels accordingly and added a number of Provenance Note 'types', with some supporting data for each note type.
The Provenance Note types we have identified so far include:
Filename Change - Data includes: pre-change name, post-change name
File Extension Change - Data includes: Type of change (added|removed|changed),
Simple Bitstream Change - Data includes: fixity (pre-change|post-change), change type(add|remove|replace), change offsets location offset, change value. change order
Complex Bitstream Change - Data includes: fixity (pre-change|post-change), change type(add|remove|replace), change offsets location offset (change start, change end), change value. change order
Unstructured Note - Data includes a free text element and fixity (pre-change|post-change) if required.
Filename Change - used if a file is submitted with non UTF-8 Basic Latin characters, or contains WinOS reserved chars. Example: file submitted as jåy.gattuso.txt could be changed to jaygattuso.txt at point of ingest, with the original file name preserved in the Prov note.
File Extension Change - used if a file is submitted without an extension (and a suitable one can be used) or if the submitted extension is incorrect. Example: file submitted as 'paper1' could be changed to 'paper1.pdf' where pdf was identified as the correct file type.
Simple Bitstream Change - used if a simple change is required to a bit stream - beginning and end of file only. Example: data found before a legitimate BOF marker, or after a EOF marker can be stripped but retained in the Prov note. Supports more than one change through the change order element.
Complex Bitstream Change - used if a complex change is required to a bit stream - Example - incorrect file metadata can be changed and the change recorded - DateTime (UFT-8 decoded) found as '12-12-2000 14:12NZST' at starting offset 0x270, changed to '2000-12-12T14:12:20+12:00' (as per format standard). Original data is retained in prov note. Supports more than one change through the change order element.
Unstructured Note - used if an unstructured note is required - where this is some known technical data that would be useful to retain with the file. Example "this file seems to crash app version 2.3, but is OK with app version 3"
There is perhaps another element that might be useful - like the find/replace feature for text applications, example: all instances of www.someURL.com have been changed to /someURL.com/ to be used when converting absolute links in objects to relative links, but we have not explored this in much detail. For complex documents (such as web pages) this might be better completed as a preservation action and recorded accordingly, however, it may be useful in XML documents to change location pointers to XSD schemas from external to internal sources.
We have a full data dictionary & data model, more complete examples, an example implementation in XML, and a demonstration script in python that both cleans and 'de-cleans' jpg files with data found after the end of image (EOI) marker, whilst demonstrating the use of fixity as an auditing/authenticity tool. Please contact me directly if these documents are of interest to you.
We are interested to hear from you if:
1) You are aware that you currently make changes of the above types prior to ingest into a permanent repository.
2) You already have a recording method that allows you to capture this type of data.
3) You think this record either does or does not belong in a PREMIS record.
4) You have suggestions for other types of technical changes that would fit in the above approach
5) You have any comments or questions about this approach.
Jay Gattuso | Digital Preservation Analyst | Preservation, Research and Consultancy
National Library of New Zealand | Te Puna Mātauranga o Aotearoa
PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064