This message examines the congruence (or incongruence) of the OAIS data
model and METS.
First I'll briefly go over OAIS, then METS, and then I'll compare the OAIS
data model element by element with the METS standard. Finally, I'll
explain why I think this is important.
I. OAIS (this description includes parts of the Reference Model that
aren't required for conformance)
In OAIS, an information package consists of four components: packaging
information, content information, preservation description information,
and descriptive information. Once inside the archive, the
descriptive information is generally seperated from the information
package, but conceptually it is part of the information package.
Each of these four components is concieved of as an information
object. An information object consists of a data object (in the digital
archive context, a bit
stream) and representation information (that is, rules for structuring the
bit stream and for lending meaning to the structured bit stream). The
rules for structuring information are called structural representation
information and the rules for lending meaning are called semantic
The preservation description information (PDI) component of the
information package is further specified by the Reference Model for Open
Archival Information Systems. The Reference
Model breaks PDI into the categories of Reference, Context, Provenance,
and Fixity. See the Reference Model for definitions of these and other
There are three kinds of information package in OAIS, defined primarily by
where they stand in relation to the archive. The Submission Information
Package is comes in to the archive from a producer, the Archival
information Package is stored in the archive, and the Dissemination
Information Package is distributed by the archive to producers.
Archival information packages are specialized as Archival Information
Units and Archival Information Collections. AIU's encapsulate a
single piece of content, while AIC's encapsulate a number of AIU's and
While the METS standard can be used outside the context of an OAIS
(e.g. FEDORA), an essential part of its identity seems to be its potential
for use as an implementation of the OAIS information package; that is,
as an SIP, AIP, or DIP. Presumably, everyone on this list is familiar, to
some extent, with the METS standard. Here's a quick rundown:
A METS document is divided into six sections: a header, a descriptive
metadata section, an administrative metadata section, a file section, a
structural map, and a behavioral section. The administrative metadata
section is further divided into sections for rights metadata, technical
metadata, source metadata, and digital provenance metadata.
Each metadata section can be thought of as a "socket" in which an
independent metadata schema can be wrapped or referenced. The file
section consists of a unix style file hierarchy (i.e. a tree with
recursive file grouping elements and file elements as
files can also be wrapped or referenced, and can be associated with
administrative metadata sections. The structural map consists of nested
<div> tags. Each div contains either a pointer to another METS document
or to a file in the file section. Each div can be associated with
descriptive and administrative metadata sections. The <area> tag allows
for referencing sections of files from the structural map.
In order for a database to be compliant with OAIS, it must carry out the
responsibilities defined in section 3.1 of the Reference Model and
implement the data model carried out in section 2.2 of the Reference
I want to focus on how (and whether) METS implements section 2.2 of
the OAIS Reference Model. The data for this analysis will come from
section 2.2. of the Reference Model on the one hand, and the METS Schema
version 1.0 on the other.
1) The information object
a. The Data Object
The Reference Model refers to the data object as "the bits" (2-4). This
basically corresponds with what in METS is either pointed to by <FLocat>
or wrapped by <FContent> within the <file> element. The <FContent>
option allows files formatted for storage in file systems other than
that in use by the archive to be stored in the archive. METS places one
requirement on these wrapped bit streams, namely that they be encoded in
b. The Representation Information
In the Reference Model, representation information is what is needed to
make the data object comprehensible to a member of the target user group
(or Designated Community). It is implicit in the discussion of
representation information in section 2.2 (and discussed at length later
in the Reference Model) that this includes both rules for structuring
the bit stream and rules for (semantically) interpreting the
structured bit stream. While maintaining software-as-representation is
cited as an acceptable solution, it is viewed as inferior to
preserving access to all information necessary to manually restructure
and understand a data stream (if necessary).
I interpret this to mean that the specs of
every system that contributes structure to a given bit stream, including
it's native architecture, OS, and application, should be
contained in or referenced by an IP's representation
information. We can rely on application software for convenience
or if the specs are unavailable.
METS does not seem to contain an explicit concept of representation
information, although various parts of the standard including the
MIMETYPE attribute of the <file> element and the
technical metadata section address structural representation
information and the structural map (despite its name) gives some basic
semantic information. The MIMETYPE attribute is most
likely to be used to cue application software. The technical metadata
section provides "technical metadata regarding a file or files". This
does not read like
a requirement to provide the information necessary to completely
restructure the bit stream, and I'm not sure that the extension
metadata sets provided by the Library of Congress
give the level of detail necessary to do so.
The structural map contains some rudimentary semantic representation
information, namely how physical sections of the file should be
interpreted in terms of logical sections. A home for information needed
to interpret data in a METS docuement is not apparent.
2. Preservation Description Information
a. Reference information
Unique identifier - <mets> OBJID attribute satisfies this
b. Context Information
"how the Content Information relates to other information outside
the information package" (2-6)
Context Information seems to be a superset of METS' rights metadata.
METS does not seem to have a convenient place to store information
about how a document relates to other documents.
c. Provenance Information
"describes the source of the Content Information, who has had
custody of it since its origination, and its history (including
The concept of Provenance information accomodates both METS source
metadata and digiprov metadata. Together, source metadata and
digiprov metadata are equivilent to Provenance information.
d. Fixity Information
Suitable for technical metadata.
3) Descriptive information
Descriptive information is used for discovery of information packages
in OAIS. Descriptive metadata in METS is suited to this purpose.
4) Packaging Information
The packaging information in OAIS "binds, identifies, and relates the
content information and PDI". The METS schema itself accomplishes
this, of course.
IV. Why this is important
It is important to think about the relationship of METS to OAIS
because various comparable activity is being carried out using one or the
other as its foundation - for example, the AV admin. metadata extensions
are based on METS, while the OCLC/RLG "Metadata Framework" report that
came out in June was based on OAIS. If METS and OAIS are not
congruent standards, then the extra intellectual
step of translation is required for work based on one standard to be used
with the other. Additionally, incongruence will complicate and perhaps
seriously hamper interoperability between archives that are based on "true
OAIS" and archives based on METS.