Dear all,

A quick follow-up on the original need for this mechanism: it was not only about efficient storage of metadata, it was also about efficient internal exchange of PREMIS files between different internal systems in a workflow. So this is more than just storage.

Any thoughts and comments on this proposal would be highly appreciated.

Best wishes,
Sébastien Peyrard
----- Réacheminé par Sébastien PEYRARD/ETS/BnF le 11/09/2012 16:35 -----

Message de : Sébastien PEYRARD/ETS/BnF (DSR/IBN)
                    06/09/2012 18:20

[log in to unmask]
Grouping Semantic Units in PREMIS: a proposal

Dear PREMIS Implementers,

A proposal has been put to the PREMIS Editorial Committee for a "semantic Unit Group" feature.

The use case inspiring the proposal is the large-scale information package: a large number of image files, as many as 20,000 (each a digitized frame of a 16mm film), where much of the data is common to most or all of the files: all of the images have the same file format, composition level, environment, significant properties and maybe even size; preservationLevel and creatingApplication, as well as linkingEventIdentifier, linkingIntellectualEntityIdentifier and linkingRightsStatementIdentifier, are identical for many (not necessarily all) of the objects.

With the current PREMIS XML schema, every file contains all (applicable) semantic units. This simple approach works well for smaller packages, consisting for example of a few hundred files. For larger packages with thousands or even tens of thousands of files, the benefit of compression might be worth the added complexity that compression imposes.

The proposed approach would allow creation of groups of data - Semantic Unit Groups - which would appear once within the package and be referenced from within the files as appropriate. It is illustrated by the draft schema at schema is offered only to concretely illustrate the desired feature; the actual schema changes which would incorporate this feature are open to discussion.

This proposal would not change the Data model or the Data Dictionary, it would just provide a different XML serialization. The proposal would be backward compatible: it would not invalidate existing PREMIS instances. Semantic Unit Groups would not be mandated, they would be an optional feature.

The PREMIS EC would like feedback from the PREMIS community on whether this feature is useful. There is at least one implementer who has expressed interest. However, that implementer seems to be more interested in this feature as a means to store PREMIS data (locally) than as an exchange mechanism.
The EC would like to also hear from implementers who exchange PREMIS data via the PREMIS XML schema: do you exchange large information packages that could benefit from the sort of compression described?
The EC also has a question for implementers who do store large information packages but do not necessarily exchange them: Is discussion (and possible formalization/recommendation e.g. in the XML Schema) of compression mechanisms or other internal storage matters useful and appropriate for this forum, or is this strictly a private implementation matter?

The EC welcomes feedback on this matter.

Best wishes,

Ray Denenberg
Library of Congress

Sébastien Peyrard

Participez à l'acquisition d'un Trésor national : le Livre d'heures de Jeanne de France

Avant d'imprimer, pensez à l'environnement.