This is a very interesting problem. Generically, "how do you record paths that use characters which do not have a unicode representation?" Unfortunately, I do not have an answer.
> This filename has some characters which are not valid in UTF-8 or ASCII
Which characters end up being unrepresented? I thought you couldn't guarantee round trip conversion Shift-JISto Unicode to Shift_JIS. I was under the impression that Shift_JIS-> Unicode would work. I understand that this breaks your ability to unambiguously look up the files on the legacy system.
Does this have full coverage of Shift_JIS? If not, you can disregard the rest of my response.
I do not think it is possible to faithfully record the original paths of a Shift_JIS filesystem in the FLocat element. Doing a liftover to a newer file system seems like a backed run around to your issue, but it would permit representation of the new paths.
You could use FContent instead of Flocat, and list the MIMETYPE attribute of the file element as Shift_JIS. This would allow you to encode the contents of the file, but not the original path. So the original problem remains.
I would run a translation to utf-8 and see if there end up being collisions in the new representation of the legacy paths. Without collisions, you can at least maintain a look up table between the current and legacy representations. However, I'm not sure how you would reference an external table in a METS document.
> As well, since the source encoding my be unknown
Without explicitly knowing the encoding, you're in a bit of a crux. There are some interesting methods for attempting to infer the coding. I would suggest using some sort of heuristic scan of the document to make an educated guess based on a priori knowledge or assumptions. Perhaps looking for runs of hex that would be unlikely in the ascii or latin-1 representation of the document's language.