Print

Print


First, please call me Clay! Only the government and telemarketers call me
Thomas :)

On Fri, 23 Aug 2002, Jerome McDonough wrote:

> My not-so-brief thoughts on Thomas's message:
>
> > b. The Representation Information
> > In the Reference Model, representation information
> > is what is needed to
> > make the data object comprehensible to
> > a member of the target user group
> > (or Designated Community).  It is implicit
> > in the discussion of representation
> > information in section 2.2 (and discussed
> > at length later in the Reference Model)
> > that this includes both rules for
> > structuring  the bit stream and rules for
> > (semantically) interpreting the
> > structured bit stream.  While maintaining
> > software-as-representation is
> > cited as an acceptable solution, it is
> > viewed as inferior to preserving access to
> > all information necessary to manually
> > restructure  and understand a data stream
> > (if necessary).
> >
> > I interpret this to mean that the specs of
> > every system that contributes structure
> > to a given bit stream, including it's native
> > architecture, OS, and application, should be
> > contained in or referenced by an IP's representation
> > information.  We can rely on application
> > software for convenience or if the specs
> > are unavailable.
> >
>
> I'd like to restate your interpretation slightly in
> the name of preserving the sanity of those
> having to create the metadata.  You need
> to record the specs of every system that
> contributes structure to a given bit stream
> *only* to the extent necessary to make the
> data object comprehensible.  If there is
> no difference between file format A as it is
> produced on a Mac running OS X and as
> it is produced on an Intel box running Win2K,
> then you don't have to record the information.
> You might *choose* to, in order to satisfy
> the historical curiosity of digital paleographers,
> but you don't have to.  On the other hand,
> if you're trying to produce an information
> package for a video game, you might
> very well have to record detail down to and
> including the specifications of video processing
> units for which the game had been optimized.
> This rather wide range of detail that an OAIS
> might capture with regards to Representation
> Information is actually noted in the OAIS spec:
> "Since a key purpose of an OAIS is to preserve
> information for a Designated Community, the
> OAIS must understand the Knowledge Base of
> its Designated Community to understand the
> minimum Representation  Information that must
> be maintained.  The OAIS should then make a decision
> between maintaining the minimum
> Representation Information needed for its
> Designated Community, or maintaining a
> larger amount of Representation Information
> that may allow understanding by a larger
> Consumer community with a less specialized
> Knowledge Base."

This seems reasonable to me.  I agree that the Reference Model recognizes
full representation information as an often (sigh) unattainable ideal.  I
think that independent repositories of specifications (with persistent
identifiers) could go a long way towards reaching this ideal.  The real
problem is proprietary systems.


>
> > METS does not seem to contain an
> > explicit concept of representation
> > information,
>
> Well, as you noted, METS does provide,
> through the behaviors section, the ability
> to link to software necessary to render the
> object and/or its parts for the user, and so
> it does provide that one specific mechanism for
> including Representation Information.  Beyond
> that, no, METS isn't explicit, it's implicit, and
> that was a deliberate design choice.  Because
> there is and will be variation in the degree of
> detail of Representation Information that any
> OAIS chooses to record, the administrative
> metadata sections are, as you put it, 'sockets'
> that any OAIS is free to plug in the structures
> it requires to record the Representation Information
> it deems adequate.  As you also noted, METS
> does provide facilities for recording some
> very minimal pieces of Rep. Info., such as
> MIME type; these constitute
> 'least-common-denominator' Representation
> Information that all of the early participants in
> METS agreed should be recorded for most
> any object.  Anything beyond that needs to
> be decided upon by the OAIS and slotted
> with the 'socket' portions of METS.

It is important to remember in these discussions that the METS 'sockets'
carry with them implied (admittedly not necessarily
enforced) semantics.  These semantics are embedded in the names of the
tags and in the documentation provided in METS.  That said, I tend to
agree that representation information of the kind needed to give structure
to a bit stream can be accomodated by the <techMD> section.

>
> > A home for information needed
> > to interpret data in a METS docuement
> > is not apparent.
>
> If by 'data in a METS document' you mean the
> bit streams within an FContent or referenced
> by an FLocat section, some (minimal) information
> is on the <file> element's attributes, more can
> be included by referencing extension-defined
> information in the technical metadata section,
> and the <behavior> section can be used to
> identify software needed to present the information
> to a particular designated community.

I was unclear here.  By 'interpret' I meant 'give meaning to', ala
semantic representation information in OAIS.  I still don't see a home for
this in METS.

>
> >   b. Context Information
> >"how the Content Information relates to other
> information
> >outside the information package" (2-6)
> >
> >Context Information seems to be a superset of
> METS' rights
> > metadata.  METS does not seem to have a
> convenient place to
> >store information
> >about how a document relates to other documents.
> >
>
> No, I would not class Context Information as
> a superset of rights metadata.  The OAIS
> reference model places Context Information
> as a subcomponent of digital preservation
> information, stating that Context Information
> "would describe why the Content Information
> was produced, and it may include a
> description of how it relates
> to another Content Information object that
> is available."  This clearly places Context Information
> within the realm of the Digital Provenance portion
> of a METS document.

I think, on reflection, that Context information relates to
Rights metadata only when very broadly construed (i.e. the legal
environment of the document).  I'm not so sure I would
put it in digiprov either though...


>
> >   c. Provenance Information
> >      "describes the source of the Content
> Information, who has had
> >       custody of it since its origination, and its
> history (including
> >       processing history)"(2-6).
> >
> >       The concept of Provenance information
> accomodates both METS
> > source       metadata and digiprov metadata.
> Together, source
> > metadata and
> >       digiprov metadata are equivilent to
> Provenance information.
> >
>
> Agreed.
>

I don't mean to split hairs here, but you are agreeing that
(METS)Digital Provenance is part of (OAIS)Provenance, while above
you stated that (OAIS)Context Information is part of (METS)Digital
Provenance.  This clearly can't be the case, as it would make
(OAIS)Provenance a subset of (OAIS)Context.


> > IV. Why this is important
>
> > If METS and OAIS are not
> > congruent standards, then the extra intellectual
> > step of translation is required for work based
> > on one standard to be used
> > with the other.  Additionally, incongruence will
> > complicate and perhaps seriously hamper
> > interoperability between archives that are
> > based on "true OAIS" and archives
> > based on METS.
>
> While I don't disagree with the idea that METS
> needs to be implemented in such a way that
> it can support archives wishing to implement
> OAIS-compliant systems, I think you're making
> the mistake of assuming not only that there
> is a 'true OAIS,' but that there is *one* true OAIS.
> An OAIS is tasked with preserving information and
> making it available for a *particular* designated
> community.  My designated community is NYU's
> students, staff and faculty; Library of Congress
> obviously has a somewhat different and larger
> designated community, which in turn differs
> quite a bit from an organization like ICPSR at
> Univ. of Michigan.  The types of Representation
> Information we'll each record may vary widely
> as we serving different communities *and*
> serving them different information.  In my
> opinion, one of the overlooked facts of the
> OAIS reference model is that it actually says
> little or nothing about interoperability at all.

My argument was and remains that METS does not map simply and
unambiguously to the OAIS data model.  I think there is one true
OAIS data model, and that METS does not map to it as simply and
unambiguously as it could.


>
> I would define METS as a first step towards
> interoperability between archives that wish
> to operate in compliance with the OAIS
> reference model.  If you look at section 2.2.3
> of the OAIS reference model, you find this
> interesting discussion: "It is necessary to distinguish
> between an Information Package that is preserved
> by an OAIS and the Information Packages that are
> submitted to, and disseminated from, an OAIS.
> These variant packages are needed to reflect
> the reality that some submissions to an OAIS
> will have insufficient Representation Information
> or PDI to meet final OAIS preservation requirements.
> In addition, these may be organized very different
> from the way the OAIS organizes the information
> it is preserving."  Further along we find this: "The
> Submission Information Package (SIP) is that
> package that is sent to an OAIS by a Producer.
> Its form and detailed content are typically negotiated
> between the Producer and the OAIS."  Implicit
> in this discussion is the recognition that 1. there
> is no standard for what a SIP will look like, and
> perhaps more importantly 2. due to the differing
> needs of organizations (in this case a Producer
> and the OAIS), someone delivering a SIP to
> an OAIS may not be able to supply information
> that the OAIS requires.  The OAIS will, therefore,
> have to engage in an "extra intellectual
> step of translation" to make the SIP useful.

As you point out, the requirements for an SIP and a DIP are less
stringent than for an AIP in OAIS.  However, METS purports to be useful in
the AIP as well.  Furthermore, while Producers will format their data
according to individual agreements with OAIS's, it is useful to
have a standard for doing so that bears some resemblance to what the
AIP will eventually look like.  METS seems to follow this logic,
providing a common framework for use in the SIP, AIP, and DIP
stages.  Wouldn't it be nice then, if such a standard mapped simply and
unambiguously to the Reference Model?


>
> METS does not completely eliminate this problem;
> it can't.  The problem is the result of the varying
> social conditions and constraints under which
> different OAIS will operate.  I would argue that
> it does help *alleviate* the problem, in that it
> provides a common format for minimal information
> needed in a SIP that we can all agree on, and a
> structure to plug in the results of negotiations
> between Producers and OAIS as to what a SIP
> should look like in a specific instance.  This in
> turn simplifies the whole negotiation process
> for any given OAIS, and by defining a single base
> format for what SIPs/DIPs look like, allows us
> as a community to share the cost of development
> of tools to work with that format.  To the extent
> we can reach further agreement on what information
> needs to be included in a SIP, we'll further
> reduce the amount of one-off negotation required
> every time someone submits information to an
> OAIS.
>
> To sum up, while I agree that it is important
> that METS support OAIS work, I think it does
> that quite adequately at the moment.  I think
> the real issue you're concerned with is
> interoperability of METS between OAIS archives,
> and the OAIS Reference Model is silent on how
> to achieve that, leaving it to negotiation between
> OAIS and Producers.  This is not actually a
> deficiency in the Reference Model; it has to
> leave that space for negotiation in recognition
> of the differing types of information that will be
> stored in OAIS, the differing communities that will
> be served, and the disparate sources that will be
> contributing information to OAIS.  However, it
> means that furthering the goal of interoperability
> of METS documents will be an on-going process
> of discussion and negotiation between the users
> of METS regarding what we agree is
> essential information that we should all support,
> and what needs to be left for local definition.  The
> current METS format defines what we've all
> agreed on to date (including agreements regarding
> what we can't currently agree on).  As time goes on
> and institutions using METS get more experience
> with the format and in identifying their own local
> needs, I suspect we'll identify further areas for
> agreement and can start making some of the 'fuzzier'
> sections of METS better defined.


The framework provided by OAIS is less specific
than METS, and provides plenty of room for individuation.  In my opinion,
one of the brilliances of METS is that it allows for metadata
extensions.  I think that the categories, however, that are extended
should correspond to those delineated by OAIS.  This would reduce
ambiguity and confusion at a number of levels: between the OAIS and METS
community, between the theory governing the construction of an OAIS and
the implementation of one of its crucial components, and within the METS
community (it would take care of those "fuzzy" sections).  Also it would
give METS a boost by more clearly linking it to an established ISO
standard.

Generally, as ambiguity and confusion decrease, interoperability
increases.  You are right in stating that my primary concern is
interoperability.



>
> So, the conversation on how to further interoperability
> in METS is indeed crucial, and the main reason why
> the METS initiative needs to continue.  But we should
> all be clear that interoperability and OAIS support
> are two different things.  At the moment, I think METS
> provides quite good support for any institution wishing
> to implement OAIS, but it does so by being rather
> remarkably loose in defining what information needs
> to be recorded.  The biggest challenge will actually
> be promoting interoperability by further defining what
> information should be included in an Information
> Package while retaining the flexibility that will be
> needed if METS is to be  used a variety
> of OAIS contexts.
>

It is my opinion that METS should be reworked in such a way that that
retains the structMap, the file section, the extension ideas,
the behavior section, and the dmd section, while replacing the
administrative metadata section with two new sections: Representation
Information and Preservation Description Information.  The result might
look like this:

metsHdr
dmdSec
pdi
  reference (assuming the id was still in the metsHdr, this section could
             contain information about the mechanism used to generate it)
  context
  provenance
  fixity
representation
  structmap  --> notice the inclusion of structmap under rep. information
                 it could also still have it's own toplevel element
  structural
  semantic
file section
behaviorSec


Clay Templeton