Print

Print


Hi All,

 

I thought it might be best to split up Kieranís email on documenting large image sequences in PREMIS into separate threads in order to all of us to participate in different aspects of the conversation and also help with keeping track of different parts of it. As always, there are no definite answers, particularly when the question is about best practice or implementation, so I would encourage you to add in your own experience/thoughts.

 

For each email in the thread, Iíve taken Kieranís original comment and subsequent follow up comment to give the context (always in RED)

 

Original: I also notice that objectCharacteristics is not applicable to Representations, so I'm not sure how to document the overall file size of the image sequence?

Follow up: This issue still seems relevant to me. It would seem that some objectCharacteristic information for a representation would be valuable to have, especially overall filesize. The note on fixity information with regards to Representations on page 59 is interesting.

 

It says that this information should be recorded on a file level, as the information is relating to individual files. However, storage information is applicable to representations in PREMIS, but should it also be said that the storage relates to files, rather than representations?

[Source: http://listserv.loc.gov/cgi-bin/wa?A2=ind1609&L=pig&T=0&X=72942623E53C1AABF1&Y&P=152]

 

We had a discussion about this in the Editorial Committee meeting today. Iím going to summarise parts of those conversations and also mix in my own preferences and biases.

 

We discussed two options here a) are there downsides or opportunities in opening up Representations to have ObjectCharacteristics b) what other option exists to deal with large sections of repeating information against files?

 

a)     Why not open up Representations to Object Characteristics?

 

Object Characteristics contains the following semantic units (highest level only shown):

1.5 objectCharacteristics (M, R) [File, Bitstream]
        1.5.1    compositionLevel (O, NR) [File, Bitstream]
        1.5.2    fixity (O, R) [File, Bitstream]
        1.5.3    size (O, NR) [File, Bitstream]
        1.5.4    format (M, R) [File, Bitstream]
        1.5.5    creatingApplication (O, R) [File, Bitstream]
        1.5.6    inhibitors (O, R) [File, Bitstream]
        1.5.7    objectCharacteristicsExtension (O, R) [File, Bitstream]

 

One argument is that allowing these semantic units to be used at the Representation level means that I can note information about the entire representation. The other argument is that I can note information about all the files in the representaiton. These are two different things. The difference between the two is the reason that we would argue not to include this information at the representation level. We canít see a clean way to differentiate what those semantic units would be covering.

 

For example, hereís one representation with three files with their sizes.

File 1=2000Kb

File 2=2000Kb

File 3=2000Kb

 

Would the representation level information be noting the sum of all the file sizes in the representation (e.g. 6000Kb), the file size of the final delviered package once it is accessed fully (which may be different than 6000Kb), or indeed, is it it noting one file size and leaving the inference that all files are the same size (e.g. 2000Kb)?

 

As it hinted at on page 59, the representation is not equivalent to the file. In that specific example, the file is not the representation; it comprises the representation.

 

With this in mind, the general recommendation would be to solve this through implementation, rather than through the model. One solution for this has already been discussed a few years ago on the listserv [http://listserv.loc.gov/cgi-bin/wa?A2=ind1209&L=pig&T=0&X=541EDA635AF2789FE3&P=430]. This looks to resolve the issue of many files sharing the same information in those implemntations that use XML. The proposal would allow groups of data that are repeated/shared to appear once in the XML and be referenced by each file. This idea dropped off the list of things to be done, so I would be interested to see if anyone has any thoughts on it, or suggestions on how else this could be achieved.

 

In terms of the storage question. This was introduced in V3 to cover off the new use case of recording physical items. While the text is a bit more ambiguous, I would note that in our (NLNZ) understanding, storage in the digital realm would be at the file level.

 

Anyway, I hope this adds to the conversation and gives others opporunities to kick in with some thoughts (and refutations!)

 

With best wishes,

Pete

 

Peter McKinney | Digital Preservation Policy Analyst | Information and Knowledge Services
National Library of New Zealand Te Puna M
tauranga o Aotearoa
Direct Dial: +64 4 462 3931 | Extn: 3931
Cnr Molesworth and Aitken Streets | PO Box 1467, Wellington 6140 |

http://digitalpreservation.natlib.govt.nz/

 

I work on Mondays, Wednesdays and Thursdays.

 

The National Library is part of the Department of Internal Affairs