Dear PREMIS Implementers,
A proposal has been put to the PREMIS Editorial Committee
for a "semantic Unit Group" feature.
The use case inspiring the proposal is the large-scale
information package: a large number of image files, as many as 20,000 (each
a digitized frame of a 16mm film), where much of the data is common to
most or all of the files: all of the images have the same file format,
composition level, environment, significant properties and maybe even size;
preservationLevel and creatingApplication, as well as linkingEventIdentifier,
linkingIntellectualEntityIdentifier and linkingRightsStatementIdentifier,
are identical for many (not necessarily all) of the objects.
With the current PREMIS XML schema, every file contains
all (applicable) semantic units. This simple approach works well for smaller
packages, consisting for example of a few hundred files. For larger packages
with thousands or even tens of thousands of files, the benefit of compression
might be worth the added complexity that compression imposes.
The proposed approach would allow creation of groups of
data - Semantic Unit Groups - which would appear once within the package
and be referenced from within the files as appropriate. It is illustrated
by the draft schema at http://www.loc.gov/standards/premis/v2/semanticUnitProposal.xsd.The schema is offered only to concretely illustrate the desired feature;
the actual schema changes which would incorporate this feature are open
This proposal would not change the Data model or the Data
Dictionary, it would just provide a different XML serialization. The proposal
would be backward compatible: it would not invalidate existing PREMIS instances.
Semantic Unit Groups would not be mandated, they would be an optional feature.
The PREMIS EC would like feedback from the PREMIS community
on whether this feature is useful. There is at least one implementer who
has expressed interest. However, that implementer seems to be more interested
in this feature as a means to store PREMIS data (locally) than as an exchange
The EC would like to also hear from implementers who exchange
PREMIS data via the PREMIS XML schema: do you exchange large information
packages that could benefit from the sort of compression described?
The EC also has a question for implementers who do store
large information packages but do not necessarily exchange them: Is discussion
(and possible formalization/recommendation e.g. in the XML Schema) of compression
mechanisms or other internal storage matters useful and appropriate for
this forum, or is this strictly a private implementation matter?
The EC welcomes feedback on this matter.
Library of Congress
Participez à l'acquisition d'un Trésor national : le Livre d'heures de Jeanne de France
Avant d'imprimer, pensez à l'environnement.