This may not be the best place for this tirade but in the interests of livening up the discussion here I go:
NB: these are my personal opinions and not the official view of Archives New Zealand. Also, I am quite sure I've missed something important here and would love to be proven wrong about this.
As I see it, the object of digital preservation is the intellectual entity. That is what is preserved post-migration/normalisation (after-all it can't be the files as we have replaced them) and what is accessed/preserved/performed through emulation.
The strongest case for having intellectual entities in PREMIS is so that practitioners can identify the relationships between the so called "content"-files and other software application files that the intellectual entities rely on and the intellectual entities themselves. Including a strong/clear description of the intellectual entity would help to enable digital preservation practitioners to judge or test, post-migration, whether the intellectual entity had been preserved, and/or judge or test whether an emulation solution had adequately performed/presented/accessed the intellectual entity.
The significant properties/essential characteristics that are often talked about within the digital preservation community are (surely?!) properties of the intellectual entity, not any particular file that it may rely upon. For example, formatting, colour, layout, size, information conveyed, interactivity, footnotes, etc etc etc. They happen to be presented to the user via a combination of files including one or more "content" files and many application/software files but in theory could be presented to the user using a completely different set of files (e.g. post-migration).
By analysing "content" files it is possible to have an indication of what significant properties an intellectual entity has, for example a photo that relies on a file with a .jpg extension might reasonably be presumed to have a certain intended size based on an assumption about the way the information in the "content" file is organised and on an assumption about what software application files are to be used in performing the intellectual entity. However these are only indications/indicators and do not define what the intellectual entity definitely is. The assumptions could be wrong. The file could have been created in a way that was slightly different to the standard that it was presumed to adhere to and the intellectual entity might rely on completely different software application files for it's performance. The performance through these different software application files might present a larger sized photo that was initially presumed as the application files may "re-interpret" the "content" files when conducting the performance.
There are two implications of this:
1. The significant properties of an intellectual entity rely on more than one file for presentation in any case (one or more "content" files and a number of application files). So they can rarely (if ever) be attributed to any particular file. --so they have to be documented across multiple files and about the thing that does cross multiple files i.e. the IE
2. The significant properties may be indicated by properties in any one file (e.g. a "content" file) but they are aren't defined by them. The fact that a file contains information that indicates that is adhering to a particular formatting standard or is intended to be rendered as an image with a particular size on screen doesn't mean that is what the creator intended, nor does it mean that the file is actually adhering to the particular formatting standard. It does indicate that but both of these things are best ascertained by asking/interrogating the creator and capturing that information as metadata. -- so the significant properties are best captured about the IE because they are only indicated by properties of any particular files, not defined by them.
Another thought is that the setup configuration of a software application can not in many cases be derived from a content file. This means that this has to be captured elsewhere. This should be captured about the IE as it is the only logical place (I think).
In other words: we need to document preservation metadata about IEs and PREMIS should have a spot for it
Ok, so good luck making sense of that. I think I have confused myself now.
Senior Advisor, Digital Continuity
DDI 04 894 6077
The PREMIS Editorial Committee
has been discussing whether and how to allow the description of an Intellectual
Entity in the PREMIS Data Dictionary. Currently a PREMIS Object
can link to an Intellectual Entity but you can not use PREMIS semantic units to
describe the Intellectual Entity.
PREMIS defines Intellectual Entity as "a set of content that is considered a single intellectual unit for purposes of management and description: for example, a particular book, map, photograph, or database. An Intellectual Entity can include other Intellectual Entities; for example, a Web site can include a Web page; a Web page can include an image. An Intellectual Entity may have one or more digital representations."
The EC has had several requests to consider expanding the Data Dictionary to include description of Intellectual Entities. We identified a number of use cases for doing this, although not all cases are equally strong.
1) A repository may want to represent an Intellectual Entity in order to capture descriptive metadata for it, have business requirements associated with it, show relationships, give high level rights information, or record related events and/or agents.
2) The repository may want to represent a batch of files with similar properties (e.g. environments) in order to avoid repetition of this information. The files would not consitute a representation.
3) The repository is sending a copy of an archived AIP containing multiple representations to another repository (for example, using the TIPR Repository Exchange Format) and wants to describe the package as a whole, as distinct from each representation.
4) The repository may want to describe a complex event such as a web craw.
5) The repository may want to distinguish intellectual file properties from actual file properties.
6) The repository may want to capture versioning information at the Intellectual Entity level for IE's such as articles or issues.
The EC's modeling showed that the most satisfying way of including Intellectual Entity in the Data Dictionary was to treat it as a fourth type of Object entity, along with Representations, Files and Bitstreams. The advantages to this approach are:
Analysis has shown that nearly
all of the semantic units applicable to Representations also seem applicable to
Intellectual Entities. Of course, this changes the Data Model and
requires a major revision of the Data Dictionary. Version 2.1 of the Data
Dictionary is coming out very soon, and will not include any change to
Intellectual Entity. If we did add Intellectual Entity as a fourth Object
type, it would probably be issued some time in the future as Version 3.0.
Before finalizing such a change, we would like to hear any comments the community of PREMIS implementers may have. Do you see use cases for describing Intellectual Entity in PREMIS? Are you comfortable with defining a new type of Object Entity? Do you see semantic units that apply to Representations that do not apply to Intellectual Entities? Are there additional semantic units that would pertain to Intellectual Entities that would be useful to include in the Data Dictionary?
If you have comments, please send them to the PIG list ([log in to unmask]) so we can get some open discussion going on this.
This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED. If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains. Instead, please notify me as soon as possible and delete the e-mail, including any attachments. Thank you.