Appendix 8: Proposed profile for exchanging metadata

A8.1 Introduction

This Appendix proposes a METS profile for exchanging metadata about digital objects. The proposed profile is in the form of a table of rules and recommendations. It will be revised after testing with ANU and UQ and further consultation. It can then be expressed in XML using the formal METS profile schema and submitted to METS for registration. The maintainer of the profile will be the National Library of Australia.

The purpose this profile addresses is use of a METS document to transfer custody of a digital object or set of digital objects from one repository to another. This is because this scenario requires the most complete set of preservation metadata. In the OAIS model the transferring repository produces the METS document as a DIP which becomes a SIP in the receiving repository. In some repositories such a document may also be stored and constitute an AIP.

SIPs for material being submitted to a repository for the first time and DIPs produced for purposes other than transferring custody of an object, may be based on this profile, containing a subset of the metadata specified here.

It is a generic profile meant for use among Australian repositories, particularly members of the Australian Partnership for Sustainable Repositories. It is not specific to a particular system or implementation. Repositories will need to map their implementation specific requirements or profile to this common one.

A8.2 General notes

A conforming METS document represents a discrete item of interest for access and preservation purposes. An item has a discrete set of metadata to describe its content.

An item will be completely described in a conforming METS document, therefore in the case of an item with parts, the parts will be completely described within the document as well.

An item (whether it contains parts or not) may be part of another item. A conforming METS document may therefore represent an "item" or a "part", as long as the "part" is a discrete item of interest with its own descriptive metadata.

A conforming METS document would not describe a "collection" or large sets of items. Nor would conforming METS documents be expected to contain information about an item's relationship to a collection or to other items in the collection, other than through the descriptive metadata. However a non-conforming METS document for the collection could contain pointers to conforming METS documents for items in the collection.

A conforming METS document must contain the files or pointers to the files comprising the archival copy of an item, as well as all supporting files and metadata necessary for its long term preservation and access. A conforming METS document may contain files or pointers to files comprising other representations of the item (e.g. thumbnail copy, display copy) as well, along with sufficient metadata to render or execute the files properly.

All available metadata should be included for the archival copy of an object - there should be no loss of granularity. If a namespace or schema doesn't cover all the metadata, the extra metadata should be included under a local namespace. If a receiving repository cannot process some of the metadata (whether the metadata is specified in this profile or not), the receiving repository should store the metadata in its raw xml form or store the whole METS document, rather than discard any metadata. The repository may be able to use the metadata eventually (e.g. if the system is enhanced) and if not, a human could read it and hopefully make some sense of it, if it became necessary for problem solving or to answer a query about an item.

The order of precedence followed in this profile for placing metadata is METS, PREMIS, other schemas specified in this profile, any other schemas. That is, metadata should be placed in a METS element or attribute if possible; if there is no appropriate place in METS, it should be placed in a PREMIS element if possible; otherwise use an element from a recommended schema; otherwise use another established schema if possible.

A8.3 Schemas

METS extension schemas:

MIX
http://www.loc.gov/standards/mix/mix.xsd
MODS
http://www.loc.gov/standards/mods/v3/mods-3-2.xsd
PREMIS
http://www.loc.gov/standards/premis/v1/PREMIS-v1-1.xsd
textMD
http://dlib.nyu.edu/METS/textmd.xsd

Other schemas:

AMD: LC-AV Audio Metadata Extension Schema
http://lcweb2.loc.gov/mets/Schemas/AMD.xsd
VIDEOMD: LC-AV Video Metadata Extension Schema
http://lcweb2.loc.gov/mets/Schemas/VMD.xsd

A8.4 Tables of METS elements and attributes

This table is based on that used by MacKenzie Smith in the draft DSpace METS profile.

R=Repeatable NR= Not Repeatable M=Mandatory O=Optional

A8.4.1 <mets> element group

Element / Attribute Profile occurrence / obligation Profile rules and recommendations
<mets> NR M Must contain PROFILE attribute and a <metsHdr> element.
PROFILE NR M Optional in METS. The value for this attribute will be:
National Library of Australia METS SIP Profile 1.0
OBJID NR M Optional in METS. Must have a primary identifier assigned to the METS document. It should be unique within the repository but does not have to be globally unique.
ID, LABEL, TYPE NR O No recommendations.

A8.4.2 <metsHdr> element group

Element / Attribute Profile occurrence / obligation Profile rules and recommendations
<metsHdr> NR M Must contain CREATEDATE and LASTMODDATE attributes.
CREATEDATE NR M Optional in METS but mandatory in this profile.
LASTMODDATE NR M Optional in METS but mandatory in this profile.
ID, RECORDSTATUS NR O No recommendations.
-<agent> R M There must be one instance for the organisation that produced the METS document and one instance for the software and version that produced the METS document. Other agents are optional.
ROLE NR M Required in METS. Use "CUSTODIAN" for the organisation and "EDITOR" for the software and version. "CREATOR" may be used for the person responsible, if any.
--<name> NR M Must contain the name of the organisation or software and version as appropriate.
--<note> R O No recommendations.
-<altrecordID> R O No recommendations.

A8.4.3 <dmdsec> element group

<dmdSec> R M The dmdSec is reserved for bibliographic description and subject analysis of the item and its constituent files, at a ratio of one dmdSec for each unique metadata record.
    Multiple metadata records describing the same item or part using different schemas should be captured in separate dmdSecs and linked via the GROUPID attribute.
    At least one dmdSec with the metadata record for the entire item must be present, the metadata in this dmdSec must conform to the MODS XML schema (one of the METS endorsed extension schemas): http://www.loc.gov/standards/mods/v3/mods-3-2.xsd
    It is strongly recommended to include additional more granular metadata records (using other schemas or namespaces) if available.
    Each dmdSec must contain an <mdWrap>.
ID NR M Required by METS
GROUPID NR O Use to identify multiple metadata records (using different schemas) describing the same item or part.
ADMID, CREATED, STATUS NR O No recommendations.
-<mdWrap> NR M See mdWrap element group section.
-<mdRef> - Not supported in this profile.

A8.4.4 <mdWrap> element group within dmdSec or amdSec elements

<mdWrap> NR M  
MDTYPE NR M METS requires the presence of this attribute and restricts the values to: MARC, MODS, EAD, DC, NISOIMG, LC-AV, VRA, TEIHDR, DDI, FGDC, LOM, PREMIS, OTHER.
Support for MODS in dmdSec and PREMIS in AMDSec are required in this profile, though it is recommended to support others in the above list if applicable to the types of material in the repository.
OTHERMDTYPE NR O Use if and only if MDTYPE value is "OTHER".
<xmlData> NR M A schema or a namespace is required by this profile. An established schema or namespace is preferred; if not available, a local namespace can be used.

A8.4.5 <amdSec> element group

<amdSec> R M

There must be at least one amdSec.

Ideally there should be one amdSec for each content file contained or referenced in the <fileSec> element of the METS document but this may not be practical for some situations.

There must be only one <amdSec> per file. (An amdSec may contain repeated <techMD>, <sourceMD>, <digiprovMD> and <rightsMD>).

    Each amdSec must contain an ID attribute and at least one <techMD> element.
ID NR M The ID attribute is optional in METS but is mandatory in this profile.
-<techMD> R M Each technical metadata record using a schema or namespace (eg PREMIS, MIX) should be organised in its own techMD.
   

There must be at least one techMD containing a metadata record in <mdWrap><xmlData> conforming to the PREMIS Object schema.

The following elements are mandatory in the PREMIS Data Dictionary for objectCategory "file" and are therefore mandatory in this profile for amdSec pertaining to files. (They are not necessarily mandatory in the PREMIS xml schema since they may not apply to all types of objectCategory.)

  • objectIdentifierType
  • objectIdentifierValue
  • preservationLevel
  • objectCategory
  • compositionLevel
  • storageMedium

The following elements from PREMIS Object entity are also mandatory in this profile:

  • formatName
  • formatVersion
  • originalName

Values of 'not applicable' and 'unknown' are permitted in mandatory elements where data cannot be supplied.

It is strongly recommended to include any optional elements in the PREMIS Object schema for which data is available.

Note: The following may be included in PREMIS metadata but are not mandatory in this profile because SIZE, CHECKSUM AND CHECKSUMTYPE are mandatory attributes for the METS <file> element in this profile.

  • messageDigestAlgorithm
  • messageDigest
  • size
    preservationLevel: Use one of the following values:
"supported" - fully supported
"known" - not supported yet but high priority to try and fully support.
"unsupported" - known or unknown format, preserve bitstream as is but low priority for support
"not_applicable" - not a preservation copy of the item
    For still image files, additional metadata not covered by PREMIS should be encoded using the MIX schema (one of the METS endorsed extension schemas): http://www.loc.gov/standards/mix/mix.xsd
    Additional metadata for text files not covered by PREMIS should be encoded using the schema at http://dlib.nyu.edu/METS/textmd.xsd (one of the METS endorsed extension schemas) with extensions recommended by the National Library of Australia at http://www.nla.gov.au/?
    Schemas for other types of files have not been endorsed by METS yet. Until then, additional metadata for audio and video files should be encoded using schemas proposed for use in the Library of Congress Audio-Visual Prototyping Project. Audio schema is at http://lcweb2.loc.gov/mets/Schemas/AMD.xsd and the video schema is at http://lcweb2.loc.gov/mets/Schemas/VMD.xsd. Additional elements recommended are appended to this document and use the namespace at: http://www.nla.gov.au/? (This namespace is not accompanied by a DTD or schema.)
-ID NR M Required by METS
--<mdWrap> NR M See mdWrap element group section.
-<rightsMD> R O <rightsMD> is optional but where present, it must contain one <mdWrap><xmlData> element, which may contain one of :
-ID NR M Required by METS
--<mdWrap> R M See mdWrap element group section.
-<sourceMD> R O

May be used but is not required if the dmdSec describes the original source material used to create the METS object e.g. if the METS object is a digital surrogate for a physical item. May be used to describe source materials between the original and current object where the source materials are not digital objects. This profile makes no recommendations about the form this metadata should take.

Must be used if <digiprovMD> includes PREMIS event metadata which has a linkingObjectIdentifier to an object which is not being transferred as part of this METS document. In this case <sourceMD> must contain a <premisObject:object>. For example, if a PDF was created from a Word document and the PDF is being transferred but the Word document is not (the Word document may have already been discarded by the transferring repository), the Word document would be described in <sourceMD> as a PREMIS Object.

-ID NR M Required by METS
--<mdWrap> NR M See mdWrap element group section.
-<digiprovMD> R M

There must be at least one <digiprovMD> for the current archival or master copy, describing the ingest event into the transferring repository. <digiprov> is optional for objects which are not the master copy.

There should be only one <digiprovMD> for each object for which events are recorded.

Each <digiprovMD> should only have one <mdWrap MDTYPE="PREMIS"> which has only one <xmlData> element containing all PREMIS Event and Agent metadata for the object.

Additional <mdWrap><xmlData> elements describing the same events in non-PREMIS schemas may be included but receiving repositories may not be able to process them.

Each event must be contained in a separate <premisEvent:event> element with xml data conforming to the PREMIS Event schema. (http://www.loc.gov/standards/premis/Event-v1-0.xsd)

Each agent (where agent is recorded) must be contained a separate <premisAgent:agent> element in the same <xmlData> element as the <premisEvent:event> with which it is associated.

    As complete a provenance history as possible should be provided for the 'master' or archival object, describing events (in separate <premisEvent:event> elements) which led to the creation of the current object and its ingest in the transferring repository. This includes changes to the object originally deposited (note that in PREMIS, an object cannot be modified: an event which modifies an object creates a new object) and changes of custody.
    Other types of events occurring after ingest of the current object into the transferring repository may be recorded in additional <premisEvent:event> elements (e.g. format validation, checksum checking)
   

The following elements are mandatory within a PREMIS Event in this profile:

  • eventIdentifierType
  • eventIdentifierValue
  • eventType
  • eventDateTime

If the event is one which changes an Object, it is strongly recommended to include information about the hardware / software used. Use the following Event elements under linkingAgentIdenfier:

  • linkingAgentIdentifierType
  • linkingAgentIdentifierValue
  • linkingAgentRole

The value in linkingAgentIdenfierType can be the name of an external registry or the repository's own name.

The value in linkingAgentIdentifierValue should be a unique identifier within the registry or transferring repository if the agent has one, or else simply a unique identifier to this agent within the METS document.

The value in linkingAgentRole should describe the agent's role e.g. "scanner". A controlled vocabulary has not been developed for this element yet.

If an organisation other than the transferring repository was responsible for an Event, that organisation should also be noted in linkingAgentIdentifier.

   

If there is a linkingAgentIdentifier, a <premisAgent:agent> element must be present within the same <xmlData> which contains the<premisEvent:event> element with which it is associated through the event's linkingAgentIdentifier. The following elements are mandatory in this profile:

  • agentIdentifierType
  • agentIdentifierValue
  • agentName (optional in PREMIS)
  • agentType (optional in PREMIS. A controlled vocabulary has not been developed for this element yet.)
   

If the object is related to another digital object through an Event and the related object is being transferred as well, the Event should contain a premisEvent:linkingObjectIdentifier which matches the related object's premisObject:objectIdentifier in the related object's <amdSec><techMD> element.

If the object is related to another digital object through an Event and the related object is not being transferred, the Event should contain a premisEvent:linkingObjectIdentifier which matches a premisObject:objectIdentifier in a premisObject metadata record in <sourceMD>. For example, a Word document may have been transformed into an RTF then to PDF. If only the PDF is being transferred, each event should be described in a separate <premisEvent:event> with linkingObjectIdentifier matching the objectIdentifier in a <premisObject:object> under <sourceMD>. The Word and RTF files would each be described (even if they no longer exist) in a <premisObject:object> element under separate <sourceMD> elements.

-ID NR M Required by METS
--<mdWrap> NR M See mdWrap element group section

A8.4.6 <fileSec> element group

<fileSec> NR M All files must be referenced via the fileSec. <fileSec> must contain one or more <fileGrp>.
<fileGrp> R M Use this element to bundle files according to the following categories described in the USE attribute:

original: The object originally submitted to a repository by the depositor, if it is being transferred and is different from the master.

master: The current archival copy (i.e. the one that has the highest priority for long term preservation). It may be the original or a modified version of the original - this should be able to be determined from digiprovMD. There should be one and only one master.

access_representation: The copy preferred for public access, if different from the Master.

Other Representation: Any other group of content files which can be used to render an object, which are not the original or the master.

structural_map: Strongly recommended that this be a single XML file e.g. a SMIL document for multimedia objects, EAD document for manuscripts, or a description of the file directory structure of a complex object. Filenames referred to should correspond to filenames in <file> OWNERID attribute. This filegroup is not necessary if it provides no more information than the <structMap> element.

metadata: Extra metadata files can be included. Not necessary if the metadata is completely covered by other sections of the METS document eg dmdSec, structMap. Files in this group should be xml files.

licence: Files that contain licences or rights agreements pertaining to the object. This filegroup is not necessary if it provides no more information than <rightsMD> element.

support: Any other supporting files (needs an example).

other: For files which don't fit into the other categories.

    In this profile:
<fileGrp> must contain one or more <file>
<fileGrp> may not contain any nested <fileGrp> elements.
ID, VERSDATE, ADMID NR O No recommendations
USE NR M Each <fileGrp> must have a USE attribute with a value from the above vocabulary.
    There must be one and one only <fileGrp> with USE="master"
    There can be 0 or 1 <fileGrp> with USE="original"
    There may be any number, or none in the other categories.
--<file>   See <file> element group section.

A8.4.7 <file> element group

<file> R M <file> must contain either a single <FLocat> or an <FContent>.
This profile doesn't provide for <stream>, <transformFile> or nested <file> at present.
ID NR M Required in METS
MIMETYPE NR M Optional in METS but required in this profile.
SEQ NR O No recommendations.
SIZE NR M Optional in METS but required in this profile.
CREATED NR O Strongly recommended. Should be the date the creating application created the file, not the date it was ingested in the transferring repository.
CHECKSUM NR M Optional in METS but required in this profile.
CHECKSUMTYPE NR M Optional in METS but required in this profile. METS specifies the following values:
HAVAL
MD5
SHA-1
SHA-256
SHA-512
TIGER
WHIRLPOOL
OWNERID NR O May be used to provide a unique identifier (including a URI) assigned to the file which may differ from the URI used to retrieve the file.
Strongly recommended for filenames referred to in files in <fileGrp> with USE="structural_map", or which may be used by a 'root' file to reconstruct or render an object.
Must be used to provide a link to the file's administrative metadata.
ADMID NR M Must be used to provide a link to the file's administrative metadata.
DMDID, GROUPID, USE NR O No recommendations.
-<FLocat> NR O Repeatable in METS but not in this profile. An <FLocat> must be provided for each <file> if the content of the file is not embedded in <FContent>.
Flocat can only be used if and only if the URL can be guaranteed under normal conditions (i.e. excluding network/connectivity issues) to be accessible by an ingesting party.
Any URL either needs to point directly to a METS package or a service which exposes the METS package to the requesting party (receiving repository).
ID, USE NR O No recommendations
LOCTYPE NR M Required in METS and values restricted to:
URN
URL
PURL
HANDLE
DOI
OTHER

"OTHER" must not be used in this profile.

OTHERLOCTYPE - Not supported by this profile.
xlink NR M No recommendations
-<FContent> NR O Must be present if there is no <FLocat>. As specified in METS, the content file must be either Base 64 encoded and contained within the subsidiary binData wrapper element , or consist of XML information and be contained within the subsidiary xmlData wrapper element.
ID, USE NR O No recommendations
binData NR O No recommendations
xmlData NR O No recommendations

A8.4.8 <structMap> element group (see example METS document)

<structMap> NR M Repeatable in METS but not in this profile. The structMap element must describe the structure of the whole object represented by the METS document.
ID NR O No recommendations
TYPE NR O No recommendations
LABEL NR O No recommendations
-<div> R M For an object whose structure is hierarchical the structure should be encoded as a tree of nested <div> elements.
For an object consisting of a single file there will be a single <div> element.
For a complex (non-hierarchical) object consisting of multiple files, there will be single <div> with the first child <fptr> indicating the 'root' file or the file which 'knows' how to 'get' the other files in order to render the object. If there is no such file, the first <fptr> may point to a file in the <fileGrp> with USE="structural_map".

The first level <div> elements must represent the whole object. Lower level <div> elements represent parts of the object (see the example METS document).

A8.4.9 <div> element group

ID NR O No recommendations
ORDER NR M The first level<div> elements represent the whole item and must have an order of "1".
Lower level <div> elements must have an order attribute, at least one of which must have an order of "1". There may be more than one <div> at the same level with the same order number.
ORDERLABEL NR O No recommendations
LABEL NR O Strongly recommended
DMDID NR O Must be present for the first level <div> representing the whole item.
Should be present for the lower level <div>s if there is a corresponding dmdSec for that part of the item.
ADMID NR O Should be present if there is a corresponding amdSec for the whole or parts of the item that is not file-specific, e.g. rights metadata.
TYPE, CONTENTIDS NR O No recommendations
-<mptr> - Not supported in this profile
-<fptr> R M Must be at least one <fptr> in each <div>. Each <fptr> must contain a FILEID to point to a file in the METS document.
The child elements <par>, <seq> and <area> are not supported in this profile.
ID NR O No recommendations
FILEID NR M Optional in METS but required in this profile.
CONTENTIDS NR O No recommendations

A8.4.10 <structLink> element group

Not supported in this profile.

A8.4.11 <behaviorSec> element group

Not supported in this profile.

A8.5 Sample METS document

See attachment.

A8.6 Recommendations

  1. Continue to develop the proposed METS profile for metadata exchange with testing and input from ANU and UQ and consultation with the wider digital preservation community with a view to registering the profile formally.