Hello,
Some updates since my last email. We are finding PREMIS to be
incredibly helpful with documenting Preservation Metadata. It is also
a very helpful guide in terms of what should be documented, and how
objects link to one another.
I'd like to provide our current use case, followed by some
observations on my previous questions in my initial post.
Firstly, here are two basic events involved in making a digital
representation of a film:
Event 1. 35mm combined optical print is scanned to 16-bit TIFF,
overscanned to include the combined optical track and perforations.
Event 2. Using AEO-Light, a PCM/WAV file is extracted from the TIFFS.
Some scanners such as the Blackmagic Cintel use DaVinci Resolve to
perform this action.
As our preferred post production software only uses DPX, the following
basic events are involved in creating a restored version:
Event A. The TIFFS created in Event 1 are transcoded losslessly to a
new 16-bit DPX sequence via ffmpeg. AVID only allows for a lossy
import of the TIFFS, so working with DPX is preferable.
Event B. PCM/WAV created in Event 2 is restored in RX5/ProTools
creating a new PCM/WAV file.
Event C. Colour correction and cropping occurs in AVID/Baselight.
Event D. A new DPX sequence and seperate WAV is exported and the
16-bit DPX created in step A is deleted.
Event E. In the future, the DPX+WAV may be converted into a single
FFV1 image stream in a Matroska container
It's an awkward workflow, but our hands are tied. Our 12-bit scanner
only has 10-bit options for DPX, so in order to get all 12-bits, we
need to use the 16-bit TIFF option.
So we create two Representations of the one Intellectual Entity that
we intend to preserve :
Representation 1 (Events 1 and 2) = Untouched TIFFS straight from the
scanner, and the seperate AEO-Light WAV file, which may be converted
to FFV1/Matroska.
Representation 2 (Events A to E) = Graded/Corrected DPX sequence and a
seperate corrected PCM/WAV file, which may be converted to
FFV1/Matroska.
Here are some updates on each of my 4 questions in the previous email,
along with three new questions:
1. After looking more at the definitions of Representations in the
PREMIS data dictionary, there seems to be no question that the
Representation must be the TIFFS and the WAVS, as the audio and image
is required for a complete rendition of the Intellectual Entity. In my
current work in progress PREMIS generation script, there is a single
Representation object followed by individual file objects for each
TIFF and the WAV.
2. This issue is still seems relevant to me. It would seem that some
objectCharacteristic information for a representation would be
valuable to have, especially overall filesize. The note on fixity
information with regards to Representations on page 59 is interesting.
It says that this information should be recorded on a file level, as
the information is relating to individual files. However, storage
information is applicable to representations in PREMIS, but should it
also be said that the storage relates to files, rather than
representations?
3. In our PREMIS implementation meetings in the IFI, we haven't
discussed Events and Agents in as much detail as objects as of yet. I
am still curious about the best way to link agents and events to
objects. Some events will include: Creation, Message Digest
Calculation, Fixity, Deletion, Compression etc. It would seem to be
most convenient for the linkingObjectIdentifier to link to the
Representation. There are some occasions where an event only relates
to either the image sequence or the seperate WAV, but not both. In
this case, it would appear best to link on a File level, rather than a
Representation level. Only transcoding the image sequence, but not
the WAV would also appear to require file level documentation, so
would this require one event, with linking identifiers to each
file(anywhere from 500 to 150,000 files)? I'm not sure how else to do
it. Multiple fixity checks over time would have a massive amount of
documentation if recorded on a file level.
4. I think that some sort of fixityExtension could be helpful, but I
currently just record checksums for each item in the 1.5.2 fixity
semantic unit and keep a seperate manifest file. Perhaps this is
already covered in eventOutcomeDetailExtension.
5. I have a new question with regards to the PREMIS v3 documentation.
There is a very useful map on page 9 displaying the relationships that
objects can have with each other. Looking at that map, there is no
arrow pointing from Representation to File. However in the example of
relationSubType(1.13.2) on page 120, it looks like a Representation
can have a 'has root' relationship with the first file in a sequence.
I would assume that the first file could have a reciprocal 'is root
of' relationship. Am I correct in thinking that there is a
contradiction, or is that map just a visualisation tool showing some
of the possible relationships?
6. Another question with regards to documenting complex process
histories. In our case, if we want to document every step, we will
need to record information about objects that will not actually be
preserved. For example, the graded/restored file that ultimately gets
sent to preservation storage has been through several events that
result in the creation and deletion of new objects. Going back to the
previously described workflow involved in restoring the captured
TIFFS, the objects created in events A and D are deleted, and their
derivatives make their way to preservation storage instead. I am
thinking that it makes sense to retain information about these deleted
objects on our database. The preserved objects can link back to them
so that we can get an unbroken sense of the process history involved
in going from film to restored DPX/FFV1. Does this make sense or is
there some other way to document this?
7. A question on relationships: The WAV is created from the TIFF
sequence, so it has a structural relationship to the TIFFS, and
possibly a derivation relationship. It is difficult to map the
structural relationship of the WAV to the TIFF when using the
recommended LOC relationshipSubTypes. In a sense, they are siblings,
as they are both ultimately derived from the same source film. The WAV
is actually created from the TIFF, so in this sense, the WAV also
appears to have a 'hasSource' relationship to the TIFFs.There are
several ways to document all this, but I wonder if some of them end up
contradicting the concept of what a representation as defined by
PREMIS is. I know that this is a niche use case but perhaps there are
similar examples in other fields that could shed some light on our
issue?
I look forward to discussing all this with the PREMIS community. It
would be great if anyone who is already documenting image sequences
via PREMIS could post to the thread as well.
Best regards,
Kieran O'Leary
IFI Irish Film Archive
P.S - I thought I'd include my initial email underneath...
>
Hello,
I am investigating/experimenting with PREMIS and I am trying to
automatically generate xml documents as items pass through our
workflows via python scripts. I work in the Irish Film Archive, so we
generally handle self-digitised moving image material as well as born
digital files. I hope that you can help me with some questions.
As we handle very large image sequences (approx 150,000 TIFF/DPX files
per film), I'm curious as to how to document these in PREMIS.
1. My main question is if these sequences need to be documented as a
representation object for the whole image sequence, and then perhaps
each image in the sequence requires their own file object? This would
lead to a gigantic xml file, but I see the value in recording this
information on a file level. I notice that something similar happens
in your 'Animal Antics' example in the v3 documentation. Are there
any examples available of an image sequence documented like this? On
our regular database, we would view the whole sequence as one
object/package, and it would have one database record per sequence.
2. I also notice that objectCharacteristics is not applicable to
Representations, so I'm not sure how to document the overall file size
of the image sequence?
3. As for events, environments, agents, It would seem to make sense to
link all these to the single Representation object. I'd hate the
thought of having linking identifiers for all 150k files to a
'capture' event, or even multiple fixity check events over time.
Hopefully linking such events to the Representation object is
sufficient?
4. Initially I was wondering how to document fixity, as it makes most
sense to me to just include a separate checksum manifest within the
SIP/AIP. There does not appear to be a method within PREMIS to point
to an external file like this for fixity, such as 'fixityExtension'? I
suppose that this is only an issue when documenting a representation
object that contains multiple files, rather than documenting fixity
for a single file.
Any help on one or all of the questions would be greatly appreciated.
Kindest regards,
Kieran O'Leary.
|