Von: Steve Bordwell [mailto:[log in to unmask]]
Gesendet: Donnerstag, 19. April 2007 16:33
An: Büchler Georg KOST; PREMIS Implementors Group Forum
Betreff: [PIG] charset encoding of text files
On 1 June 2006 you wrote to the PIG list:
“A quick question: we want to record PREMIS metadata for plain text files (tab separated values, encoding UTF-8, CRLF as line separator). Since there is as yet no format registry that offers full support for this kind of file format we are wondering how to record the encoding in PREMIS. As I understand there is no place for this information in the format element. How do others deal with this problem?”
As the PREMIS review and revision process picks up speed the PREMIS Editorial Committee has begun working its way through points raised by PREMIS implementers. Please accept our apologies for taking so long to reply to your query. Today we discussed yours, and agreed that we should respond to the original query to the PIG Forum so that others can participate in the discussion. I have been asked to respond on behalf of the PREMIS EC.
Considering the detail you require to describe the format we agreed that, as format information of this sort is implementation specific and therefore outside the scope of PREMIS, this information should be kept in some other way. Within METS, textMD seems a good place to store implementation specific format information of this sort.
Alternatively, you might consider using a term from a locally generated controlled vocabulary within the semantic unit formatName, and store the detailed technical meaning of each of the controlled terms (e.g. tab separated values, encoding UTF-8, CRLF as line separator) outwith PREMIS.
Format registries do not at this time appear to provide this level of descriptive detail.
The PREMIS EC would be interested to hear any comments relating to this issue.
Digital Data Archive Project Manager
Digital Access Team
National Archives of Scotland