|
On 7-Dec-06, at 5:55 PM, Mike Pall wrote:
Well, then there are also distinct characters that have the same glyph shape, Like 'a' and '\u0430' (Cyrillic a). Normalization won't help you here ... There is no perfect solution.
Absolutely, but one can minimize confusion.I'm unlikely to accidentally type a Cyrillic a when I meant 'a', but it is very easy to accidentally have the wrong character encoding, or to be using an input method which decomposition normalizes instead of composition normalizing.
Protecting against the wrong character encoding is easy, though: just insist that the source file be valid utf-8, which is a very fast test.