Hi Rob,
In terms of the larger points you make we're definitely interested in
having conversations in these areas. A quick question; how much of the
information in the Safety Deposit Box has a PREMIS equivalent? Is it a
big gap between PREMIS and the information you want to hold?
I've copied in your "minor" questions below and put a response to some
of them from my point of view based on experience at National Library of
New Zealand. Very happy indeed to discuss this further with anyone
interested.
Best,
Pete
>
> I also have a whole bunch of other, more minor questions, which I
list
> below:
> 1. Why is it necessary to state whether an embedded object is a
FileStream
> or a Bitstream? Not sure why this helps since anything embedded has
to be
> extracted by some method (and we may not know what that method is).
Here's a scenario to see if this covers what you're talking about.
We have a bytestream that contains a bitstream. For example, an image
inside a word document. If we are trying to pull out that image from
Word, there will necessarily be some degree of transformation on the
image to make it into a filestream so it can exist as of itself. If
however, we are pulling out images from an ARC file then that image is a
filestream and no transformation is needed to be made as it can stand by
itself.
Therefore for us, it would help to know if the object is a filestream
or bitstream. I guess there are a number of ways you could know this
though — for example, you may know by the format of the bytestream
that any objects inside it have to be bitstreams (or filestreams).
> 4. Modification date. This is explicitly excluded in favour of a
creation
> date. I can see the reasoning (i.e. modifying a file really creates
a new
> file). However, in file systems this "creation" date is called "last
> modified date" so the naming is a little confusing.
We're trying to deal with this very issue right now. I would like to
see the opportunity to put both dates in. One solution for us right now
is to extend the metadata elements to include both creation and
modification date; the other is that most MD extractors obviously pull
out (many) dates, which for us are mapped to "significant properties"
[see below for more on this term within NLNZ].
From our view, the earliest date is very useful to track how old the
file may be — useful in risk, format ID'ing, etc.
> 5. I'm not at all clear how to use "preservation level" or what is
the
> point of it. Can this be further explained?
We don't use it at all — all the content is considered to be of the
same preservation value. I guess I can see a use though for
organisations that are offering a service to other organisations so that
the preservation analysts know they don't need to keep risk analysis for
those files marked with no preservation value, and do not need to
extract any technical metadata for them, etc.
> 6. Why are properties called "significant properties"? I'd just call
> these "properties": if they are "significant properties" depends on
the
> context.
Totally agree that significant properties is a context sensitive term.
PREMIS describes them in the most accepted meaning by the preservation
community which is that they are properties "determined to be important
to maintain through preservation actions".
We do not use them in this way. We see these properties as those
characteristics that we can use in four main ways:
1. To help us with risk analysis
2. To help us identify the formats more specifically
3. To help us pull together files that are the same for preservation
planning and execution
4. To help us evaluate and record what has happened across a
preservation action.
They are not properties to us that must be maintained across an action.
That is to say, if we have a risky colour encoding (CIELab for example)
then it will be a property that has to change across the action in ord
er
to get rid of the risk.
So yes, if we can't change the meaning of 'significant properties',
then as we use them they are just "properties".
> 7. Not clear why all of the creating application, environment,
software
> and hardware entities are needed. This information is usually
implied
> from the format (via a registry) so why store it at all (as the
registry
> is likely to be updated later with better information).
Creating application is important to us. We get it from the file or
from our digitisation uploads. Depending on how we get it, it is either
stored as a significant property or on the PREMIS element. To us it's
important because it can tell us why the file isn't conforming to what
we expect of the format (the application wrote something
'idiosyncratically').
We do not use PREMIS elements to note rendering information.
> 10. Not clear what relationships between Objects are helpful. Are
there
> examples?
We will be using relationships between files and reps that have been
created from a preservation and relating them back to the files and reps
they were migrated from. This allows us to give full transparency across
the action, particularly important where we're moving from one-to-many
or from many-to-one in the action.
We're looking right now to see if we want to use it for IEs that have
representations that are not hierarchical in nature. For example, PREMIS
uses the example of the TEI and the images. Do we want to note that the
relationship between the two is that the TEI is a transcript of the
image?
> 12. It would be useful to add the ability to record whether a file is
> valid or well-formed against its format.
Totally agree. We record this information in specific MD elements. It
would be good if these were in PREMIS too.
|