Print

Print


Steve,
 
many thanks for your reaction to my question. As you implicitly realised it originated in an early state of discussion about PREMIS in our project. It was only later (prompted, among others, by answers posted on the PIG list) that we arrived to a fuller understanding of the interaction of PREMIS with other metadata, in particular with METS. We are following this lead in current projects.
 
However I continue to consider the lack of format registry support for this level of detail as a problem. I think that your suggestion of a locally generated controlled vocabulary, referenced either in PREMIS or in textMD, could be an effective way of sorting this out.
 
Best wishes,
 
Georg


Von: Steve Bordwell [mailto:[log in to unmask]]
Gesendet: Donnerstag, 19. April 2007 16:33
An: Büchler Georg KOST; PREMIS Implementors Group Forum
Betreff: [PIG] charset encoding of text files

Georg,

 

On 1 June 2006 you wrote to the PIG list:

 

“A quick question: we want to record PREMIS metadata for plain text files (tab separated values, encoding UTF-8, CRLF as line separator). Since there is as yet no format registry that offers full support for this kind of file format we are wondering how to record the encoding in PREMIS. As I understand there is no place for this information in the format element. How do others deal with this problem?”

 

As the PREMIS review and revision process picks up speed the PREMIS Editorial Committee has begun working its way through points raised by PREMIS implementers. Please accept our apologies for taking so long to reply to your query. Today we discussed yours, and agreed that we should respond to the original query to the PIG Forum so that others can participate in the discussion. I have been asked to respond on behalf of the PREMIS EC.

 

Considering the detail you require to describe the format we agreed that, as format information of this sort is implementation specific and therefore outside the scope of PREMIS, this information should be kept in some other way. Within METS, textMD seems a good place to store implementation specific format information of this sort.

 

Alternatively, you might consider using a term from a locally generated controlled vocabulary within the semantic unit formatName, and store the detailed technical meaning of each of the controlled terms (e.g. tab separated values, encoding UTF-8, CRLF as line separator) outwith PREMIS.

 

Format registries do not at this time appear to provide this level of descriptive detail.

 

The PREMIS EC would be interested to hear any comments relating to this issue.

 

Steve Bordwell

 

Digital Data Archive Project Manager

Digital Access Team

National Archives of Scotland

0131-242-5813