Have any other METS users run into this issue? Any suggestions?
I've run into an issue representing certain file paths in certain METS fields, for instance FLocat links.
Some files I'm describing in a METS document were created on legacy operating systems and filesystems, and their filenames are in non-Unicode encodings. For example, one file I've been looking at contains Shift-JIS characters in its name. This filename has some characters which are not valid in UTF-8 or ASCII - e.g., characters above ordinal 127. Other filenames could potentially contain characters which are not valid in UTF-16.
Unfortunately, since XML requires that all strings be Unicode or ASCII, I'm not sure how to represent these paths in the document. My understanding is that these fields are meant to represent actual paths, so base64-encoding the original data is out. As well, since the source encoding my be unknown or may contain characters that are unrepresentable in Unicode, transcoding the strings into Unicode before writing the METS is out. (That would also make it difficult to find the associate with the files on disk, since the Unicode-transcoded version wouldn't match what the filesystem is storing.)