LISTSERV mailing list manager LISTSERV 16.0

Help for PIG Archives


PIG Archives

PIG Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

PIG Home

PIG Home

PIG  January 2010

PIG January 2010

Subject:

Responses to Robert Sharpe (Re: PREMIS Implementation Fair feedback)

From:

Rebecca S Guenther <[log in to unmask]>

Reply-To:

PREMIS Implementors Group Forum <[log in to unmask]>

Date:

Thu, 14 Jan 2010 15:11:46 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (77 lines)

Apologies for taking so long to answer this very well thought-out message from Robert Sharpe. The PREMIS Editorial Committee has been discussing the issues brought up in this message from late October. Following are responses to all of those that have an answer and don't need further discussion. For those that require further discussion, we will either send out a message to the PIG list to solicit feedback or discuss them in the PREMIS EC and send out responses later.

I have retained the original numbering from the message. Where there are gaps are questions that require further discussion.

Contributors include:
Priscilla Caplan, Peter McKinney, Angela Dappert, Rebecca Guenther

Big questions
1. Models Intellectual entities (information objects in OAIS)
These are the things that we want to preserve so it is important to model their significant properties. The Planets conceptual model does not worry about descriptive information since other schemas do a good job at this. 
However, it is important to model the existence and properties of the atomic information object needed for transformation (we call these "components"), which is often a smaller unit of information than traditional structural/descriptive models normally deal with. As an 
example a descriptive model might deal with a web site but we need to model each individual web page if we are to be able to verify their properties before and after a transformation.

Response: The "component" in the Planets model is not equivalent to the Intellectual Entity in PREMIS.  Significant properties can (in Planets) adhere to components that can be embedded within larger physical entities, as, for example, text and image components might be embedded within a PDF file.  Text and image would have different significant characteristics.
The PREMIS Editorial Committee has started work to result in a future revision of PREMIS to include semantic units that describe the Intellectual Entity (in PREMIS terms). It is  likely that future revisions of PREMIS will in some way accommodate the emerging Planets model of significant properties.  There have already been conversations between PREMIS and Planets principals.

2. Models structural metadata.
There are important concepts here since new representations created via migration can be complex combinations of existing files and newly created files. Similarly, new information objects can also reuse files that already exist in the repository (e.g., when creating new web site 
snapshots). This can lead to complex structural relationships that need to be modelled by a truly comprehensive preservation information model. I believe this needs to be part of a preservation conceptual model. It is true that, physically, it is possible to hold this information in existing schemas (e.g., METS) although sometimes with a little awkwardness.

Response: PREMIS was never intended to include all information needed for all purposes.  For example, clearly some descriptive metadata is needed, but there are adequate schemes already in use for this (as noted in point 1 above) and the Working Group that drafted the Data Dictionary did not feel a need to include descriptive metadata in it.  Similarly there are many schemes that adequately handle structural metadata. 

3. Models Transformation entities.
This can be used to control preservation planning, migration or emulation. This could be done through the current PREMIS Event entity (but I think having an explicit entity would be clearer especially in a conceptual model). The things that need to be recorded include the representation of the component being transformed and the new representation of that component plus information on the migration pathways and the verification process that took place.

Response: This question is about recording explicitly how a preservation action creates a new representation from an old one. This involves recording the relationship between the representations, the preservation action event, the agent used to perform the preservation action, and details, such as configuration parameters, significant characteristics which guided the choice of preservation action, measured differences between the source and the target (outcome information), etc. 
This all fits the PREMIS model very well. The PREMIS Editorial Committee believes that the PREMIS data model needs to stay as slim as possible, while being able to capture what we need. It does not want to introduce a special type of entity for preservation actions.

However, the PREMIS Editorial Committee will consider a refined event model that captures what people want to say about events in one place. For example, if you have an n:m migration, e.g. creating one pdf from multiple files, or creating multiple spreadsheets from one database file, it is very cumbersome and verbose in PREMIS at the moment.

Smaller questions:

1. Why is it necessary to state whether an embedded object is a FileStream or a Bitstream? Not sure why this helps since anything embedded has to be extracted by some method (and we may not know what that method is).

Response: It is not necessary to state this, but if you want to use a bitstream object you should know what a bitstream is.  I think maybe the real question here is what does it matter if an object is a filestream or a bitstream.  The answer is that, since a filestream can stand alone, it can actually be treated and described as a file object, while a bitstream can not.
Here's a possible scenario. We have a bytestream that contains a bitstream. For example, an image inside a word document. If we are trying to pull out that image from Word, there will necessarily be some degree of transformation on the image to make it into a filestream so it can exist as of itself. If however, we are pulling out images from an ARC file then that image is a filestream and no transformation is needed to be made as it can stand by itself. 
Therefore, it would help to know if the object is a filestream or bitstream. You would know that an object was a bitstream by the objectCategory value = "bitstream".  You would have to infer that an object was a filestream by the fact that the objectCategory value = "file" but the contentLocationType would be "byte offset" or something like that.  The PREMIS Editorial Committee thought that it would probably be better to make this explicit.

2. The Data Dictionary states "If all identifiers are local to repository system, it is unlikely that identifier type would need to be explicitly recorded for each identifier in the system". I agree but most Identifier Types in the schema do appear to be mandatory? 

Response: Note that "mandatory" means the repository needs to know it, but how it knows it isn't in scope. It really means that it has to be recoverable by the system. The XML schema is a particular implementation of the data dictionary; the type could be generated when exchanging data in XML.

3. Along the same lines, every time you use a format identifier you need to name the registry. This is usually implied and so it is a lot of unnecessary repetition. Can this be made less verbose? 

Response: An implementation could implement it as a business rule that it always uses a certain format registry, and again, it doesn't need to be explicitly named if it can be recovered later by the system.

5. I'm not at all clear how to use "preservation level" or what is the point of it. Can this be further explained? 

Response: Preservation level is a business rule and most business rules are not in scope for PREMIS. It has to do with the intentions and capabilities of a given repository. As with other semantic units, as a business rule there may be nothing that would be stored for each item. 

The PREMIS Editorial Committee will consider having them stand on their own, not as part of the object entity.

11. How would we record the existence of an empty folder in PREMIS? This is important in some cases (e.g., to allow DVDs to be stored and rebuilt) 

Response: You could interpret a folder as a representation or as an intellectualEntity, depending on how you are planning to use it. In either case you would want to declare an objectIdentifier - not yet possible for the intellectualEntity - but the PREMIS Editorial Committee is working on that. Whether or not it is empty is structural information.

Other questions to be addressed after further discussion:
4) Modification dates
6) Significant properties
7) creatingApplication, environment, software, hardware
8) environment registries; related to 7)
9) storageLocation
10) relationships
12) recording whether file is valid or well-formed against its format

Rebecca

Rebecca S. Guenther                                                       
Senior Networking and Standards Specialist                  
Network Development and MARC Standards Office     
Library of Congress   
101 Independence Ave. SE                                                                                      
Washington, DC 20540-4402                                          
(202) 707-5092 (voice)    
(202) 707-0115 (FAX)           
 [log in to unmask]

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

February 2024
January 2024
December 2023
August 2023
July 2023
June 2023
March 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
May 2022
April 2022
January 2022
December 2021
October 2021
August 2021
July 2021
June 2021
April 2021
March 2021
January 2021
December 2020
September 2020
August 2020
July 2020
June 2020
April 2020
February 2020
December 2019
November 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager