Print

Print


Hi Angela,

 

I was referring to this document in particular: http://www.planets-project.eu/docs/reports/Planets_PP2_D3_ReportOnPolicyAndStrategyModelsM36_Ext.pdf

The Report on the Conceptual Aspects of Preservation, Based on Policy and Strategy Models for Libraries, Archives and Data Centres

 

Thanks for the clarification. I will have to have a good think about this. I am a little worried that there is confusion going on between file-system identifiable things such as files within container files and non-file system identifiable intellectual things such as reports, items of correspondence, laws, poems, artworks, newspapers, magazines, journals etc. My worry is that we are not talking about the latter and we need to be as these are the things that we actually want to preserve.

 

For example, if we do a migration of a poem we want to make sure the poem is still there post-migration to a new performance (i.e. to a new content-file(s) and software environment combination). In order to do that we have to have some way of automatically testing for this. In order to do that we need to have some idea of what the poem is (and possibly what it is not) so we can check to see that it is still there post-migration. In order to do that we need to store information about such intellectual things. In order to do that we need to have somewhere to store this information. If we are not identifing these things as different to simply file-system identifiable components of other files then we will be using the same infrastrucutre (e.g. the same part of the PREMIS schema) to store information about both things or only the File-system identifiable things and will not be able to automate actions based on that information as we won't be able to automatically know which type of thing it applies to.

 

Clear as mud?

 

Thanks again for your reply, I appreciate the chance to discuss these issues.

 

Kind regards,

 

 

Euan Cochrane

Senior Advisor, Digital Continuity 

Archives New Zealand 

 

DDI 04 894 6077

E [log in to unmask]

www.archives.govt.nz

From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Dappert, Angela
Sent: Saturday, 18 December 2010 12:31 a.m.
To: [log in to unmask]
Subject: Re: [PIG] PREMIS description of Intellectual Entities

 

Euan,

 

I wonder which of our reports you are referring to. In general if I said “digital object” I probably meant what is better defined as “Preservation Object”. I have copied an explanation below from a paper. The difference of this to a PREMIS “object” is that we

(1) explicitly introduced “Component”, which are really just another form of “IntellectualEntity”, because people really wanted a separate concept for parts of a whole that need to be characterised.

(2) explicitly introduced representation bitstreams to be able to distinguish the “intellectual file” and the “actual file”. The “intellectual file would have the ideal file size, checksum etc, and the “actual file” might have different values, e.g. due to corruption – but this distinction is probably a bit academic and not necessarily needed in PREMIS.

 

(I wonder whether in some documents I might have carelessly referred to the preservation object together with its environment as a digital object.)

 

   Best wishes,

           Angela

 

 

Preservation Object

 

The preservation object concept corresponds to those objects in need of preservation. It has subclasses on three tiers, as illustrated in Figure 3. The top two tiers are associated with specific physical representations of digital objects. The top tier comprises physical objects, such as bitstreams and its subclasses including bytestreams and files. The middle tier comprises representations of logical objects consisting of representation bitstreams that are needed to create a single rendition of a logical object (e.g., the set of html and gif files[1] needed to render the web version of a journal article). The bottom tier comprises logical objects such as intellectual entities and components.

 

[log in to unmask]">

Figure 3 Preservation Object Subclasses

 

An intellectual entity is a distinct intellectual or artistic creation. PREMIS [15] defines it as a set of content that is considered a single intellectual unit for purposes of management and description. The intellectual entity can be extended in ways to meet the needs of stakeholders. For example, in the library setting, common subclasses include collection, work, and expression. In an archival setting, subclasses such as fonds and series are also relevant. Most repositories support discovery and delivery of intellectual entities such as books, videos, and articles. They may augment these with work and expression subclasses to capture useful FRBR distinctions [11]. Intellectual entities may also correspond to larger structures, such as collections, which may not be of interest to the end-user, but may be significant in preservation decisions.

During preservation, it is often necessary to consider fine-grained components of an intellectual entity.  Examples include table, image, title, substring, or even an individual character. The component entity can be decomposed in several ways, such as by the type of content (e.g., textComponent, imageComponent), or by structure (e.g., headerComponent or tableOfContentsComponent). Values for characteristics of components can be measured from their associated representations (e.g. the font of a character component can be extracted from its representation bitstream.).

Properties can be applicable to objects in every tier. For example:

fileSize or encoding are applicable to files.

numberOfFilesInTheRepresentation, totalRepresentationSize, resolution, or preservationLevel are applicable to representations.

pageCount or frameRate are properties applicable to intellectual entities such as a journal article or video. Alignment is a property applicable to a textComponent. SemanticInterpretation can be a characteristic of any component.

 

 

...........                      Angela


From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Euan Cochrane
Sent: 14 December 2010 20:06
To: [log in to unmask]
Subject: Re: [PIG] PREMIS description of Intellectual Entities

 

Hi Angela,

 

Thanks for the reply. I completely agree with what you said about the software configurations.  

I have read over the planets work and I really like it. I have one question (not directly PREMIS related) that I haven't been able to find an answer to though and which you might be able to help with. In relation to the Concept Model, what is a "digital object"? Many of the other concept definitions refer to it and its not defined in the Report.

 

Thanks again,

 

Euan Cochrane

Senior Advisor, Digital Continuity 

Archives New Zealand 

 

DDI 04 894 6077

E [log in to unmask]

www.archives.govt.nz

From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Dappert, Angela
Sent: Wednesday, 15 December 2010 12:14 a.m.
To: [log in to unmask]
Subject: Re: [PIG] PREMIS description of Intellectual Entities

 

Euan,

 

I read your arguments as strong support for the proposed extension of the PREMIS dictionary.

 

In Use Case 1 we tried to capture your argument about significant characteristics belonging to Intellectual Entities ( 1) A repository may want to represent an Intellectual Entity in order to capture descriptive metadata for it, have business requirements associated with it, show relationships, give high level rights information, or record related events and/or agents.). We consider significant characteristics to be a set of business requirements associated with the Intellectual Entity.  The proposed changes were strongly motivated by the Planets preservation model in which we associate all business requirements with Intellectual Entities. (The exception is that, for bit preservation, there are obvious significant characteristics of the files themselves: E.g. the bit sequence must be preserved or be re-constructible.)

 

I don’t agree, however, that the setup configuration of a software application should be captured about the IE. Software configuration is part of the whole rendering stack to which the files belong. With different representations come different files, rendering stacks and software configurations. In Planets we associate an Intellectual Entity with an Environment object. This environment object can be represented through a choice of rendering stacks. The environments can have their own business requirements attached to them and the chosen rendering stack should satisfy those. This is one reason why I would like for us to introduce a separate Environment conceptual entity in PREMIS which we can associate with Intellectual Objects as well as with their chosen representation.

 

   Best wishes,

...........                      Angela


From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Euan Cochrane
Sent: 13 December 2010 23:16
To: [log in to unmask]
Subject: Re: [PIG] PREMIS description of Intellectual Entities

 

Hi all,

 

This may not be the best place for this tirade but in the interests of livening up the discussion here I go:

 

 

NB: these are my personal opinions and not the official view of Archives New Zealand. Also, I am quite sure I've missed something important here and would love to be proven wrong about this.

 

 

As I see it, the object of digital preservation is the intellectual entity. That is what is preserved post-migration/normalisation (after-all it can't be the files as we have replaced them) and what is accessed/preserved/performed through emulation.

 

The strongest case for having intellectual entities in PREMIS is so that practitioners can identify the relationships between the so called "content"-files and other software application files that the intellectual entities rely on and the intellectual entities themselves. Including a strong/clear description of the intellectual entity would help to enable digital preservation practitioners to judge or test, post-migration, whether the intellectual entity had been preserved, and/or judge or test whether an emulation solution had adequately performed/presented/accessed the intellectual entity.

 

The significant properties/essential characteristics that are often talked about within the digital preservation community are (surely?!) properties of the intellectual entity, not  any particular file that it may rely upon. For example, formatting, colour, layout, size, information conveyed, interactivity, footnotes, etc etc etc. They happen to be presented to the user via a combination of files including  one or more "content" files and many application/software files but in theory could be presented to the user using a completely different set of files (e.g. post-migration).

 

By analysing "content" files it is possible to have an indication of what significant properties an intellectual entity has, for example a photo that relies on a file with a .jpg extension might reasonably be presumed to have a certain intended size based on an assumption about the way the information in the "content" file is organised and on an assumption about what software application files are to be used in performing the intellectual entity. However these are only indications/indicators and do not define what the intellectual entity definitely is. The assumptions could be wrong. The file could have been created in a way that was slightly different to the standard that it was presumed to adhere to and the intellectual entity might rely on completely different software application files for it's performance. The performance through these different software application files might present a larger sized photo that was initially presumed as the application files may "re-interpret" the "content" files when conducting the performance.

  

There are two implications of this:

1. The significant properties of an intellectual entity rely on more than one file for presentation in any case (one or more "content" files and a number of application files). So they can rarely (if ever) be attributed to any particular file. --so they have to be documented across multiple files and about the thing that does cross multiple files i.e. the IE

2. The significant properties may be indicated by properties in any one file (e.g. a "content" file) but they are aren't defined by them. The fact that a file contains information that indicates that is adhering to a particular formatting standard or is intended to be rendered as an image with a particular size on screen doesn't mean that is what the creator intended, nor does it mean that the file is actually adhering to the particular formatting standard.   It does indicate that but both of these things are best ascertained by asking/interrogating the creator and capturing that information as metadata.  -- so the significant properties are best captured about the IE because they are only indicated by properties of any particular files, not defined by them.

 

Another thought is that the setup configuration of a software application can not in many cases be derived from a content file. This means that this has to be captured elsewhere. This should be captured about the IE as it is the only logical place (I think).

 

In other words: we need to document preservation metadata about IEs and PREMIS should have a spot for it

 

Ok, so good luck making sense of that. I think I have confused myself now.

 

 

Regards,

 

 

Euan Cochrane

Senior Advisor, Digital Continuity 

Archives New Zealand 

 

DDI 04 894 6077

E [log in to unmask]

www.archives.govt.nz

From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Priscilla Caplan
Sent: Tuesday, 14 December 2010 6:03 a.m.
To: [log in to unmask]
Subject: [PIG] PREMIS description of Intellectual Entities

 

The PREMIS Editorial Committee has been discussing whether and how to allow the description of an Intellectual Entity in the PREMIS Data Dictionary.    Currently a PREMIS Object can link to an Intellectual Entity but you can not use PREMIS semantic units to describe the Intellectual Entity.

PREMIS defines Intellectual Entity as "a set of content that is considered a single intellectual unit for purposes of management and description: for example, a particular book, map, photograph, or database. An Intellectual Entity can include other Intellectual Entities; for example, a Web site can include a Web page; a Web page can include an image. An Intellectual Entity may have one or more digital representations."

The EC has had several requests to consider expanding the Data Dictionary to include description of Intellectual Entities.  We identified a number of use cases for doing this, although not all cases are equally strong.

1) A repository may want to represent an Intellectual Entity in order to capture descriptive metadata for it, have business requirements associated with it, show relationships, give high level rights information, or record related events and/or agents.

2) The repository may want to represent a batch of files with similar properties (e.g. environments) in order to avoid repetition of this information.  The files would not consitute a representation. 

3) The repository is sending a copy of an archived AIP containing multiple representations to another repository (for example, using the TIPR Repository Exchange Format) and wants to describe the package as a whole, as distinct from each representation.

4) The repository may want to describe a complex event such as a web craw.

5) The repository may want to distinguish intellectual file properties from actual file properties.

6) The repository may want to capture versioning information at the Intellectual Entity level for IE's such as articles or issues.

The EC's modeling showed that the most satisfying way of including Intellectual Entity in the Data Dictionary was to treat it as a fourth type of Object entity, along with Representations, Files and Bitstreams.  The advantages to this approach are:

Analysis has shown that nearly all of the semantic units applicable to Representations also seem applicable to Intellectual Entities.  Of course, this changes the Data Model and requires a major revision of the Data Dictionary.  Version 2.1 of the Data Dictionary is coming out very soon, and will not include any change to Intellectual Entity.  If we did add Intellectual Entity as a fourth Object type, it would probably be issued some time in the future as Version 3.0.

Before finalizing such a change, we would like to hear any comments the community of PREMIS implementers may have.  Do you see use cases for describing Intellectual Entity in PREMIS?  Are you comfortable with defining a new type of Object Entity?  Do you see semantic units that apply to Representations that do not apply to Intellectual Entities?  Are there additional semantic units that would pertain to Intellectual Entities that would be useful to include in the Data Dictionary?

If you have comments, please send them to the PIG list ([log in to unmask]) so we can get some open discussion going on this.

Thanks,

Priscilla

This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains.  Instead, please notify me as soon as possible and delete the e-mail, including any attachments.  Thank you.

This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains.  Instead, please notify me as soon as possible and delete the e-mail, including any attachments.  Thank you.





[1] The formal definition of such a statement would of course contain a persistent unique identifier of the exact version of the file formats. For improved readability of examples we casually refer to file formats by their file extension.

This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains.  Instead, please notify me as soon as possible and delete the e-mail, including any attachments.  Thank you.