Print

Print


Hi,

I've run into an issue representing certain file paths in certain METS
fields, for instance FLocat links.

Some files I'm describing in a METS document were created on legacy
operating systems and filesystems, and their filenames are in non-Unicode
encodings. For example, one file I've been looking at contains Shift-JIS
characters in its name. This filename has some characters which are not
valid in UTF-8 or ASCII - e.g., characters above ordinal 127. Other
filenames could potentially contain characters which are not valid in
UTF-16.

Unfortunately, since XML requires that all strings be Unicode or ASCII, I'm
not sure how to represent these paths in the document. My understanding is
that these fields are meant to represent actual paths, so base64-encoding
the original data is out. As well, since the source encoding my be unknown
or may contain characters that are unrepresentable in Unicode, transcoding
the strings into Unicode before writing the METS is out. (That would also
make it difficult to find the associate with the files on disk, since the
Unicode-transcoded version wouldn't match what the filesystem is storing.)

Have any other METS users run into this issue? Any suggestions?

Best,
Misty De Meo

-- 
Misty De Meo
Software Developer / Systems Analyst
Artefactual Systems
www.artefactual.com