lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 7-Dec-06, at 7:52 AM, David Jones wrote:

Actually dealing with shift-state dependent multi-byte encodings in a portable way in C makes the infinite horrors of Unicode and UTF-8 seem very attractive.

I think Unicode is blamed for a lot of things that are not really its fault. Correctly rendering all the languages of the world, potentially mixed in the same document, is an inherently difficult problem, not because of the encoding scheme but because of the complexity of the rendering rules.

Unicode, although not perfect, is a valiant attempt to deal with these complexities in a way which allows semantic analysis and still makes reasonable rendering possible. Most of its flaws (and I would count precomposed characters as a flaw) are, sadly, the result of political compromises without which Unicode would never have been adopted at all.