> agree on what terminates a string, right?
And also excluding some unicode characters from the string to offer
escaping possibilities for storage and filtering.
I consider that we at least should exclude the 20 characters defined as
"separator" by Unicode 6.0 (Z) and soft hyphens (U+00AD) as well as
control (Cc) and formatting (Cf) characters, etc. This could be translated
to a validating regex such as: