lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Rici Lake wrote:
> If I actually use the identifier código (say) in some file, and try to 
> refer to it from another file, it might fail because the encodings are 
> different. For example, one file might be in iso-8859-1, or both of 
> them might be in utf-8 but one of them uses a composed ó and the other 
> one uses an o and a combining accent. These differences may be 
> completely invisible.

Well, then there are also distinct characters that have the same
glyph shape, Like 'a' and '\u0430' (Cyrillic a). Normalization
won't help you here ... There is no perfect solution.

> I strongly agree that "locale-dependent lexing is bad"; however, robust
> lexing needs to be aware of unicode normalization forms. Unfortunately,
> that is by no means cheap.

IMHO simple is better. Even Java doesn't normalize in the lexer:

http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#40625

Bye,
     Mike