Dear Ray and Sébastien,
I think this would be a very useful feature. In Archivematica
(https://www.archivematica.org/) we are currently working on ingesting
large email accounts (10,000+ messages) in maildir format, and the
resulting AIP METS file is huge. In addition to some identical object
characteristics for each email message, we have an extensive list of
events, such as ingestion, format identification, fixity check, message
digest calculation, virus check etc. We would definitely be interested in
using something like a semantic unit group to link multiple email messages
to a single set of events. Extracted email attachments would probably
continue to have their own complete PREMIS entities.
Artefactual Systems Inc.
> Dear PREMIS Implementers,
> A proposal has been put to the PREMIS Editorial Committee for a "semantic
> Unit Group" feature.
> The use case inspiring the proposal is the large-scale information
> package: a large number of image files, as many as 20,000 (each a
> digitized frame of a 16mm film), where much of the data is common to most
> or all of the files: all of the images have the same file format,
> composition level, environment, significant properties and maybe even
> size; preservationLevel and creatingApplication, as well as
> linkingEventIdentifier, linkingIntellectualEntityIdentifier and
> linkingRightsStatementIdentifier, are identical for many (not necessarily
> all) of the objects.
> With the current PREMIS XML schema, every file contains all (applicable)
> semantic units. This simple approach works well for smaller packages,
> consisting for example of a few hundred files. For larger packages with
> thousands or even tens of thousands of files, the benefit of compression
> might be worth the added complexity that compression imposes.
> The proposed approach would allow creation of groups of data - Semantic
> Unit Groups - which would appear once within the package and be referenced
> from within the files as appropriate. It is illustrated by the draft
> schema at http://www.loc.gov/standards/premis/v2/semanticUnitProposal.xsd.
> The schema is offered only to concretely illustrate the desired feature;
> the actual schema changes which would incorporate this feature are open to
> This proposal would not change the Data model or the Data Dictionary, it
> would just provide a different XML serialization. The proposal would be
> backward compatible: it would not invalidate existing PREMIS instances.
> Semantic Unit Groups would not be mandated, they would be an optional
> The PREMIS EC would like feedback from the PREMIS community on whether
> this feature is useful. There is at least one implementer who has
> expressed interest. However, that implementer seems to be more interested
> in this feature as a means to store PREMIS data (locally) than as an
> exchange mechanism.
> The EC would like to also hear from implementers who exchange PREMIS data
> via the PREMIS XML schema: do you exchange large information packages that
> could benefit from the sort of compression described?
> The EC also has a question for implementers who do store large information
> packages but do not necessarily exchange them: Is discussion (and possible
> formalization/recommendation e.g. in the XML Schema) of compression
> mechanisms or other internal storage matters useful and appropriate for
> this forum, or is this strictly a private implementation matter?
> The EC welcomes feedback on this matter.
> Best wishes,
> Ray Denenberg
> Library of Congress
> Sébastien Peyrard
> PREMIS EC
> Participez à l'acquisition d'un Trésor national : le Livre d'heures de
> Jeanne de France Avant d'imprimer, pensez à l'environnement.