Hi all,
This may not be the best place for this tirade but in the
interests of livening up the discussion here I go:
NB: these are my personal opinions and not the official view of
Archives New Zealand. Also, I am quite sure I've missed something important
here and would love to be proven wrong about this.
As I see it, the object of digital preservation is the
intellectual entity. That is what is preserved post-migration/normalisation (after-all
it can't be the files as we have replaced them) and what is accessed/preserved/performed
through emulation.
The strongest case for having intellectual entities in PREMIS is
so that practitioners can identify the relationships between the so called
"content"-files and other software application files that the
intellectual entities rely on and the intellectual entities themselves. Including
a strong/clear description of the intellectual entity would help to enable digital
preservation practitioners to judge or test, post-migration, whether the
intellectual entity had been preserved, and/or judge or test whether an
emulation solution had adequately performed/presented/accessed the intellectual
entity.
The significant properties/essential characteristics that are
often talked about within the digital preservation community are (surely?!)
properties of the intellectual entity, not any particular file that it
may rely upon. For example, formatting, colour, layout, size, information
conveyed, interactivity, footnotes, etc etc etc. They happen to be presented to
the user via a combination of files including one or more "content"
files and many application/software files but in theory could be presented to
the user using a completely different set of files (e.g. post-migration).
By analysing "content" files it is possible to have an
indication of what significant properties an intellectual entity has, for example
a photo that relies on a file with a .jpg extension might reasonably be presumed
to have a certain intended size based on an assumption about the way the
information in the "content" file is organised and on an assumption
about what software application files are to be used in performing the intellectual
entity. However these are only indications/indicators and do not define what
the intellectual entity definitely is. The assumptions could be wrong. The file
could have been created in a way that was slightly different to the standard
that it was presumed to adhere to and the intellectual entity might rely on
completely different software application files for it's performance. The
performance through these different software application files might present a
larger sized photo that was initially presumed as the application files may "re-interpret"
the "content" files when conducting the performance.
There are two implications of this:
1. The significant properties of an intellectual entity rely on
more than one file for presentation in any case (one or more "content"
files and a number of application files). So they can rarely (if ever) be attributed
to any particular file. --so they have to be documented across multiple files
and about the thing that does cross multiple files i.e. the IE
2. The significant properties may be indicated by properties in
any one file (e.g. a "content" file) but they are aren't defined by
them. The fact that a file contains information that indicates that is adhering
to a particular formatting standard or is intended to be rendered as an image with
a particular size on screen doesn't mean that is what the creator intended, nor
does it mean that the file is actually adhering to the particular formatting standard.
It does indicate that but both of these things are best ascertained
by asking/interrogating the creator and capturing that information as metadata.
-- so the significant properties are best captured about the IE because they are
only indicated by properties of any particular files, not defined by them.
Another thought is that the setup configuration of a software
application can not in many cases be derived from a content file. This means
that this has to be captured elsewhere. This should be captured about the IE as
it is the only logical place (I think).
In other words: we need to document preservation metadata about
IEs and PREMIS should have a spot for it
Ok, so good luck making sense of that. I think I have confused
myself now.
Regards,
Euan
Cochrane
Senior
Advisor, Digital Continuity
Archives
DDI 04 894 6077
From:
PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Priscilla
Caplan
Sent: Tuesday, 14 December 2010 6:03 a.m.
To: [log in to unmask]
Subject: [PIG] PREMIS description of Intellectual Entities
The PREMIS Editorial Committee
has been discussing whether and how to allow the description of an Intellectual
Entity in the PREMIS Data Dictionary. Currently a PREMIS Object
can link to an Intellectual Entity but you can not use PREMIS semantic units to
describe the Intellectual Entity.
PREMIS defines Intellectual Entity as "a set of content that is considered
a single intellectual unit for purposes of management and description: for
example, a particular book, map, photograph, or database. An Intellectual
Entity can include other Intellectual Entities; for example, a Web site can
include a Web page; a Web page can include an image. An Intellectual Entity may
have one or more digital representations."
The EC has had several requests to consider expanding the Data Dictionary to
include description of Intellectual Entities. We identified a number of
use cases for doing this, although not all cases are equally strong.
1) A repository may want to represent an Intellectual Entity in order to
capture descriptive metadata for it, have business requirements associated with
it, show relationships, give high level rights information, or record related
events and/or agents.
2) The repository may want to represent a batch of files with similar
properties (e.g. environments) in order to avoid repetition of this
information. The files would not consitute a representation.
3) The repository is sending a copy of an archived AIP containing multiple
representations to another repository (for example, using the TIPR Repository
Exchange Format) and wants to describe the package as a whole, as distinct from
each representation.
4) The repository may want to describe a complex event such as a web craw.
5) The repository may want to distinguish intellectual file properties from
actual file properties.
6) The repository may want to capture versioning information at the
Intellectual Entity level for IE's such as articles or issues.
The EC's modeling showed that the most satisfying way of including Intellectual
Entity in the Data Dictionary was to treat it as a fourth type of Object entity,
along with Representations, Files and Bitstreams. The advantages to this
approach are:
Analysis has shown that nearly
all of the semantic units applicable to Representations also seem applicable to
Intellectual Entities. Of course, this changes the Data Model and
requires a major revision of the Data Dictionary. Version 2.1 of the Data
Dictionary is coming out very soon, and will not include any change to
Intellectual Entity. If we did add Intellectual Entity as a fourth Object
type, it would probably be issued some time in the future as Version 3.0.
Before finalizing such a change, we would like to hear any comments the
community of PREMIS implementers may have. Do you see use cases for
describing Intellectual Entity in PREMIS? Are you comfortable with
defining a new type of Object Entity? Do you see semantic units that
apply to Representations that do not apply to Intellectual Entities? Are
there additional semantic units that would pertain to Intellectual Entities that
would be useful to include in the Data Dictionary?
If you have comments, please send them to the PIG list ([log in to unmask]) so we can get some open discussion
going on this.
Thanks,
Priscilla
This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED. If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains. Instead, please notify me as soon as possible and delete the e-mail, including any attachments. Thank you.