Dear all,

A quick follow-up on the original need for this mechanism: it was not only 
about efficient storage of metadata, it was also about efficient internal 
exchange of PREMIS files between different internal systems in a workflow. 
So this is more than just storage.

Any thoughts and comments on this proposal would be highly appreciated.

Best wishes,
Sébastien Peyrard
----- Réacheminé par Sébastien PEYRARD/ETS/BnF le 11/09/2012 16:35 -----

Message de : Sébastien PEYRARD/ETS/BnF (DSR/IBN)
                      06/09/2012 18:20

[log in to unmask]

Grouping Semantic Units in PREMIS: a proposal

Dear PREMIS Implementers,

A proposal has been put to the PREMIS Editorial Committee for a "semantic 
Unit Group" feature.

The use case inspiring the proposal is the large-scale information 
package: a large number of image files, as many as 20,000 (each a 
digitized frame of a 16mm film), where much of the data is common to most 
or all of the files: all of the images have the same file format, 
composition level, environment, significant properties and maybe even 
size; preservationLevel and creatingApplication, as well as 
linkingEventIdentifier, linkingIntellectualEntityIdentifier and 
linkingRightsStatementIdentifier, are identical for many (not necessarily 
all) of the objects.

With the current PREMIS XML schema, every file contains all (applicable) 
semantic units. This simple approach works well for smaller packages, 
consisting for example of a few hundred files. For larger packages with 
thousands or even tens of thousands of files, the benefit of compression 
might be worth the added complexity that compression imposes.

The proposed approach would allow creation of groups of data - Semantic 
Unit Groups - which would appear once within the package and be referenced 
from within the files as appropriate. It is illustrated by the draft 
schema at 
The schema is offered only to concretely illustrate the desired feature; 
the actual schema changes which would incorporate this feature are open to 

This proposal would not change the Data model or the Data Dictionary, it 
would just provide a different XML serialization. The proposal would be 
backward compatible: it would not invalidate existing PREMIS instances. 
Semantic Unit Groups would not be mandated, they would be an optional 

The PREMIS EC would like feedback from the PREMIS community on whether 
this feature is useful. There is at least one implementer who has 
expressed interest. However, that implementer seems to be more interested 
in this feature as a means to store PREMIS data (locally) than as an 
exchange mechanism.
The EC would like to also hear from implementers who exchange PREMIS data 
via the PREMIS XML schema: do you exchange large information packages that 
could benefit from the sort of compression described?
The EC also has a question for implementers who do store large information 
packages but do not necessarily exchange them: Is discussion (and possible 
formalization/recommendation e.g. in the XML Schema) of compression 
mechanisms or other internal storage matters useful and appropriate for 
this forum, or is this strictly a private implementation matter?

The EC welcomes feedback on this matter.

Best wishes,

Ray Denenberg 
Library of Congress

Sébastien Peyrard

Participez à l'acquisition d'un Trésor national : le  Livre d'heures de Jeanne de France Avant d'imprimer, pensez à l'environnement.