Euan,
I read your arguments as strong support
for the proposed extension of the PREMIS dictionary.
In Use Case 1 we tried to capture your
argument about significant characteristics belonging to Intellectual Entities (
1) A repository may want to represent an
Intellectual Entity in order to capture descriptive metadata for it, have
business requirements associated with it, show relationships, give high level
rights information, or record related events and/or agents.). We
consider significant characteristics to be a set of business requirements
associated with the Intellectual Entity. The proposed changes were
strongly motivated by the Planets preservation model in which we associate all
business requirements with Intellectual Entities. (The exception is that, for
bit preservation, there are obvious significant characteristics of the files
themselves: E.g. the bit sequence must be preserved or be re-constructible.)
I don’t agree,
however, that the setup configuration of a software application should be
captured about the IE. Software configuration is part of the whole rendering
stack to which the files belong. With different representations come different
files, rendering stacks and software configurations. In Planets we associate an
Intellectual Entity with an Environment object. This environment object can be
represented through a choice of rendering stacks. The environments can have
their own business requirements attached to them and the chosen rendering stack
should satisfy those. This is one reason why I would like for us to introduce a
separate Environment conceptual entity in PREMIS which we can associate with
Intellectual Objects as well as with their chosen representation.
Best
wishes,
...........
Angela
From: PREMIS Implementors Group Forum
[mailto:[log in to unmask]] On Behalf Of Euan
Cochrane
Sent: 13 December 2010 23:16
To: [log in to unmask]
Subject: Re: [PIG] PREMIS
description of Intellectual Entities
Hi all,
This may not be the
best place for this tirade but in the interests of livening up the discussion
here I go:
NB: these are my
personal opinions and not the official view of Archives New Zealand. Also, I am
quite sure I've missed something important here and would love to be proven
wrong about this.
As I see it, the
object of digital preservation is the intellectual entity. That is what is
preserved post-migration/normalisation (after-all it can't be the files as we
have replaced them) and what is accessed/preserved/performed through emulation.
The strongest case
for having intellectual entities in PREMIS is so that practitioners can
identify the relationships between the so called "content"-files and
other software application files that the intellectual entities rely on and the
intellectual entities themselves. Including a strong/clear description of the
intellectual entity would help to enable digital preservation practitioners to
judge or test, post-migration, whether the intellectual entity had been
preserved, and/or judge or test whether an emulation solution had adequately
performed/presented/accessed the intellectual entity.
The significant
properties/essential characteristics that are often talked about within the
digital preservation community are (surely?!) properties of the intellectual
entity, not any particular file that it may rely upon. For example,
formatting, colour, layout, size, information conveyed, interactivity,
footnotes, etc etc etc. They happen to be presented to the user via a
combination of files including one or more "content" files and
many application/software files but in theory could be presented to the user
using a completely different set of files (e.g. post-migration).
By analysing
"content" files it is possible to have an indication of what
significant properties an intellectual entity has, for example a photo that
relies on a file with a .jpg extension might reasonably be presumed to have a
certain intended size based on an assumption about the way the information in
the "content" file is organised and on an assumption about what
software application files are to be used in performing the intellectual
entity. However these are only indications/indicators and do not define what
the intellectual entity definitely is. The assumptions could be wrong. The file
could have been created in a way that was slightly different to the standard
that it was presumed to adhere to and the intellectual entity might rely on
completely different software application files for it's performance. The
performance through these different software application files might present a
larger sized photo that was initially presumed as the application files may
"re-interpret" the "content" files when conducting the
performance.
There are two
implications of this:
1. The significant
properties of an intellectual entity rely on more than one file for
presentation in any case (one or more "content" files and a number of
application files). So they can rarely (if ever) be attributed to any
particular file. --so they have to be documented across multiple files and
about the thing that does cross multiple files i.e. the IE
2. The significant
properties may be indicated by properties in any one file (e.g. a
"content" file) but they are aren't defined by them. The fact that a
file contains information that indicates that is adhering to a particular
formatting standard or is intended to be rendered as an image with a particular
size on screen doesn't mean that is what the creator intended, nor does it mean
that the file is actually adhering to the particular formatting standard.
It does indicate that but both of these things are best ascertained
by asking/interrogating the creator and capturing that information as
metadata. -- so the significant properties are best captured about the IE
because they are only indicated by properties of any particular files, not
defined by them.
Another thought is
that the setup configuration of a software application can not in many cases be
derived from a content file. This means that this has to be captured elsewhere.
This should be captured about the IE as it is the only logical place (I think).
In other words: we
need to document preservation metadata about IEs and PREMIS should have a spot
for it
Ok, so good luck
making sense of that. I think I have confused myself now.
Regards,
Euan
Cochrane
Senior Advisor, Digital
Continuity
Archives
DDI 04 894 6077
From: PREMIS Implementors Group Forum
[mailto:[log in to unmask]] On Behalf Of Priscilla
Caplan
Sent: Tuesday, 14 December 2010
6:03 a.m.
To: [log in to unmask]
Subject: [PIG] PREMIS description
of Intellectual Entities
The PREMIS
Editorial Committee has been discussing whether and how to allow the
description of an Intellectual Entity in the PREMIS Data Dictionary.
Currently a PREMIS Object can link to an Intellectual Entity but
you can not use PREMIS semantic units to describe the Intellectual Entity.
PREMIS defines Intellectual Entity as "a set of content that is considered
a single intellectual unit for purposes of management and description: for
example, a particular book, map, photograph, or database. An Intellectual
Entity can include other Intellectual Entities; for example, a Web site can
include a Web page; a Web page can include an image. An Intellectual Entity may
have one or more digital representations."
The EC has had several requests to consider expanding the Data Dictionary to
include description of Intellectual Entities. We identified a number of
use cases for doing this, although not all cases are equally strong.
1) A repository may want to represent an Intellectual Entity in order to
capture descriptive metadata for it, have business requirements associated with
it, show relationships, give high level rights information, or record related
events and/or agents.
2) The repository may want to represent a batch of files with similar
properties (e.g. environments) in order to avoid repetition of this
information. The files would not consitute a representation.
3) The repository is sending a copy of an archived AIP containing multiple
representations to another repository (for example, using the TIPR Repository
Exchange Format) and wants to describe the package as a whole, as distinct from
each representation.
4) The repository may want to describe a complex event such as a web craw.
5) The repository may want to distinguish intellectual file properties from
actual file properties.
6) The repository may want to capture versioning information at the
Intellectual Entity level for IE's such as articles or issues.
The EC's modeling showed that the most satisfying way of including Intellectual
Entity in the Data Dictionary was to treat it as a fourth type of Object
entity, along with Representations, Files and Bitstreams. The advantages
to this approach are:
Analysis has
shown that nearly all of the semantic units applicable to Representations also
seem applicable to Intellectual Entities. Of course, this changes the
Data Model and requires a major revision of the Data Dictionary. Version
2.1 of the Data Dictionary is coming out very soon, and will not include any
change to Intellectual Entity. If we did add Intellectual Entity as a
fourth Object type, it would probably be issued some time in the future as Version
3.0.
Before finalizing such a change, we would like to hear any comments the
community of PREMIS implementers may have. Do you see use cases for
describing Intellectual Entity in PREMIS? Are you comfortable with
defining a new type of Object Entity? Do you see semantic units that
apply to Representations that do not apply to Intellectual Entities? Are
there additional semantic units that would pertain to Intellectual Entities
that would be useful to include in the Data Dictionary?
If you have comments, please send them to the PIG list ([log in to unmask]) so we can get some open discussion
going on this.
Thanks,
Priscilla
This
e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may
also be LEGALLY PRIVILEGED. If you are not the intended addressee, please
do not use, disclose, copy or distribute the message or the information it
contains. Instead, please notify me as soon as possible and delete the
e-mail, including any attachments. Thank you.