I read your arguments as strong support for the proposed extension of
the PREMIS dictionary. 


In Use Case 1 we tried to capture your argument about significant
characteristics belonging to Intellectual Entities ( 1) A repository may
want to represent an Intellectual Entity in order to capture descriptive
metadata for it, have business requirements associated with it, show
relationships, give high level rights information, or record related
events and/or agents.). We consider significant characteristics to be a
set of business requirements associated with the Intellectual Entity.
The proposed changes were strongly motivated by the Planets preservation
model in which we associate all business requirements with Intellectual
Entities. (The exception is that, for bit preservation, there are
obvious significant characteristics of the files themselves: E.g. the
bit sequence must be preserved or be re-constructible.)


I don't agree, however, that the setup configuration of a software
application should be captured about the IE. Software configuration is
part of the whole rendering stack to which the files belong. With
different representations come different files, rendering stacks and
software configurations. In Planets we associate an Intellectual Entity
with an Environment object. This environment object can be represented
through a choice of rendering stacks. The environments can have their
own business requirements attached to them and the chosen rendering
stack should satisfy those. This is one reason why I would like for us
to introduce a separate Environment conceptual entity in PREMIS which we
can associate with Intellectual Objects as well as with their chosen


Hi all,


This may not be the best place for this tirade but in the interests of
livening up the discussion here I go:



NB: these are my personal opinions and not the official view of Archives
New Zealand. Also, I am quite sure I've missed something important here
and would love to be proven wrong about this. 



As I see it, the object of digital preservation is the intellectual
entity. That is what is preserved post-migration/normalisation
(after-all it can't be the files as we have replaced them) and what is
accessed/preserved/performed through emulation.


The strongest case for having intellectual entities in PREMIS is so that
practitioners can identify the relationships between the so called
"content"-files and other software application files that the
intellectual entities rely on and the intellectual entities themselves.
Including a strong/clear description of the intellectual entity would
help to enable digital preservation practitioners to judge or test,
post-migration, whether the intellectual entity had been preserved,
and/or judge or test whether an emulation solution had adequately
performed/presented/accessed the intellectual entity.


The significant properties/essential characteristics that are often
talked about within the digital preservation community are (surely?!)
properties of the intellectual entity, not  any particular file that it
may rely upon. For example, formatting, colour, layout, size,
information conveyed, interactivity, footnotes, etc etc etc. They happen
to be presented to the user via a combination of files including  one or
more "content" files and many application/software files but in theory
could be presented to the user using a completely different set of files
(e.g. post-migration). 


By analysing "content" files it is possible to have an indication of
what significant properties an intellectual entity has, for example a
photo that relies on a file with a .jpg extension might reasonably be
presumed to have a certain intended size based on an assumption about
the way the information in the "content" file is organised and on an
assumption about what software application files are to be used in
performing the intellectual entity. However these are only
indications/indicators and do not define what the intellectual entity
definitely is. The assumptions could be wrong. The file could have been
created in a way that was slightly different to the standard that it was
presumed to adhere to and the intellectual entity might rely on
completely different software application files for it's performance.
The performance through these different software application files might
present a larger sized photo that was initially presumed as the
application files may "re-interpret" the "content" files when conducting
the performance.


There are two implications of this:

1. The significant properties of an intellectual entity rely on more
than one file for presentation in any case (one or more "content" files
and a number of application files). So they can rarely (if ever) be
attributed to any particular file. --so they have to be documented
across multiple files and about the thing that does cross multiple files
i.e. the IE

2. The significant properties may be indicated by properties in any one
file (e.g. a "content" file) but they are aren't defined by them. The
fact that a file contains information that indicates that is adhering to
a particular formatting standard or is intended to be rendered as an
image with a particular size on screen doesn't mean that is what the
creator intended, nor does it mean that the file is actually adhering to
the particular formatting standard.   It does indicate that but both of
these things are best ascertained by asking/interrogating the creator
and capturing that information as metadata.  -- so the significant
properties are best captured about the IE because they are only
indicated by properties of any particular files, not defined by them. 


Another thought is that the setup configuration of a software
application can not in many cases be derived from a content file. This
means that this has to be captured elsewhere. This should be captured
about the IE as it is the only logical place (I think).


In other words: we need to document preservation metadata about IEs and
PREMIS should have a spot for it


Ok, so good luck making sense of that. I think I have confused myself






The PREMIS Editorial Committee has been discussing whether and how to
allow the description of an Intellectual Entity in the PREMIS Data
Dictionary.    Currently a PREMIS Object can link to an Intellectual
Entity but you can not use PREMIS semantic units to describe the
Intellectual Entity.

PREMIS defines Intellectual Entity as "a set of content that is
considered a single intellectual unit for purposes of management and
description: for example, a particular book, map, photograph, or
database. An Intellectual Entity can include other Intellectual
Entities; for example, a Web site can include a Web page; a Web page can
include an image. An Intellectual Entity may have one or more digital

The EC has had several requests to consider expanding the Data
Dictionary to include description of Intellectual Entities.  We
identified a number of use cases for doing this, although not all cases
are equally strong.

1) A repository may want to represent an Intellectual Entity in order to
capture descriptive metadata for it, have business requirements
associated with it, show relationships, give high level rights
information, or record related events and/or agents.

2) The repository may want to represent a batch of files with similar
properties (e.g. environments) in order to avoid repetition of this
information.  The files would not consitute a representation.  

3) The repository is sending a copy of an archived AIP containing
multiple representations to another repository (for example, using the
TIPR Repository Exchange Format) and wants to describe the package as a
whole, as distinct from each representation.

4) The repository may want to describe a complex event such as a web

5) The repository may want to distinguish intellectual file properties
from actual file properties.

6) The repository may want to capture versioning information at the
Intellectual Entity level for IE's such as articles or issues.

The EC's modeling showed that the most satisfying way of including
Intellectual Entity in the Data Dictionary was to treat it as a fourth
type of Object entity, along with Representations, Files and Bitstreams.
The advantages to this approach are:

*	It is intuitively similar to Objects
*	The Data Dictionary will be more compact.
*	We can simplify the Data Dictionary because we could drop links
such as linkingIntellectualIdentifier 
*	We could directly attach events, agents and indirectly rights to
intellectual entities

Analysis has shown that nearly all of the semantic units applicable to
Representations also seem applicable to Intellectual Entities.  Of
course, this changes the Data Model and requires a major revision of the
Data Dictionary.  Version 2.1 of the Data Dictionary is coming out very
soon, and will not include any change to Intellectual Entity.  If we did
add Intellectual Entity as a fourth Object type, it would probably be
issued some time in the future as Version 3.0.

Before finalizing such a change, we would like to hear any comments the
community of PREMIS implementers may have.  Do you see use cases for
describing Intellectual Entity in PREMIS?  Are you comfortable with
defining a new type of Object Entity?  Do you see semantic units that
apply to Representations that do not apply to Intellectual Entities?
Are there additional semantic units that would pertain to Intellectual
Entities that would be useful to include in the Data Dictionary?

If you have comments, please send them to the PIG list ([log in to unmask]) so
we can get some open discussion going on this.



