We would like to gather some feedbacks as to what's the best approach to record format profiles in PREMIS for our repository. In our implementation, we would like to record all conformed format profiles, for example GeoTIFF, TIFF/EP, etc., in addition to the primary format (TIFF 6.0) identified by the format tools. The PREMIS format information would then be populated into our preservation database where we can perform further analysis in our archive. The PREMIS 2.0 data dictionary suggests to record the most specific format designation when recording format profiles (page 196). It recommends to achieve it by either using multipart format name or recording only the most specific format with repository specific rules.
One issue with using the multipart format name is the lack of ability to record format registry information for each format. For example, if a TIFF is identified as TIFF 6.0 with two matching format profiles GeoTIFF and TIFF/EP, the multipart format name would become something like "TIFF_GEOTIFF_TIFF/EP" with format version "6.0". It becomes unclear how the format registry information shall be recorded, shall the associated format registry record the registry id for TIFF 6.0 or shall it record the registry id for GEO TIFF? If we would like to record registry id for every format and profiles, shall we add separate format element? Another issue is how to record the multipart format name, shall we generate the multipart format name by alphabetical order? Won't this generate n possible combination of multipart format name?
The instruction to use the most specific format implies a hierarchy of formats, e.g. GeoTiff is a child of TIFF and hence more specific. Even the usage note to the effect that specificity is in the eye of the beholder assumes there are hierarchical format classes and the issue is how deep to go (e.g. text, xml or METS). However, many profiles are independent. A TIFF could conform to both GeoTIFF and TIFF EP profiles, neither is more specific. We may well want to identify all TIFF that conform to either profile, and so would have to record both. If we use the convention TIFF_GeoTIFF_TIFF-EP in format we have the problem that we cannot identify 3 corresponding format registry ids. Also, we can't tie profile to version -- what if a file in format A version x has a profile B in version y?
We are wondering if any other institutions in the mailing list have also encountered similar issues or are also planning to record format profiles in PREMIS. Would anyone be willing to share their implementation or their thoughts on this? Any feedback or suggestion will be greatly appreciated.
Florida Center for Library Automation