Hi All,

Continuing the series of responses to documenting large sequences in PREMIS.

Original: In our PREMIS implementation meetings in the IFI, we haven't discussed Events and Agents in as much detail as objects as of yet. I am still curious about the best way to link agents and events to objects. Some events will include: Creation, Message Digest Calculation, Fixity, Deletion, Compression etc. It would seem to be most convenient for the linkingObjectIdentifier to link to the Representation. There are some occasions where an event only relates to either the image sequence or the seperate WAV, but not both. In this case, it would appear best to link on a File level, rather than a Representation level.  Only transcoding the image sequence, but not the WAV would also appear to require file level documentation, so would this require one event, with  linking identifiers to each file(anywhere from 500 to 150,000 files)? I'm not sure how else to do it. Multiple fixity checks over time would have a massive amount of documentation if recorded on a file level.

Follow up: Are there recommended practices for relating events to representation objects when only one file (or 100,000) files might have been the target of an event? Is it ok to document this in eventDetail?

This is a best practice rather than a data dictionary issue. There is nothing in the data dictionary to argue agains the use of the eventDetail for events noted at the Representation level but that have only run across some of the files in that representation. It was noted however in the EC call today that it is a balance of understanding what the risks are in taking this approach. It�s a questions of making sure you�re comfortable that you know what info you are not recording or must be inferred.

In terms of one of your examples; the fixity information. At NLNZ, when we rerun fixity � that is, when we run a process across items in the preservation repository that checks the existing fixity values with newly generated ones � we do not record this as an event. We see it as a process that is run and we note the time and parameters for that process elsewhere in the system (logs and external documentation). The only time an event is generated is when there is a mismatch occurs (that is, the new value does not match the stored value). This then flags that file and sends it to a workbench to be assesed and for a solution to be implemented. We do the same with virus checking. In this way, we don�t increase the size of our MD by storing what we view as redunant MD that we can infer through documentation of process rather than record of individal events.

Again, more suggestions/comments encouraged.


Peter McKinney | Digital Preservation Policy Analyst | Information and Knowledge Services
National Library of New Zealand Te Puna M�tauranga o Aotearoa
Direct Dial: +64 4 462 3931 | Extn: 3931
Cnr Molesworth and Aitken Streets | PO Box 1467, Wellington 6140 |

I work on Mondays, Wednesdays and Thursdays.

The National Library is part of the Department of Internal Affairs