lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Roberto Ierusalimschy wrote:
>> Actually dealing with shift-state dependent multi-byte encodings in a  
>> portable way in C makes the infinite horrors of Unicode and UTF-8  
>> seem very attractive.
> 
> This seems a quite acurate summary of the situation.

The horrors of UTF-8 are ℵ₀, but the horrors of full Unicode are at *least* ℵ₁...

Slightly more seriously, it occurs to me that since composite characters mean
you can't rely on any individual glyph being encoded in a single Unicode
code-point, then 32-bit Unicode does, in fact, gain you nothing except a false
sense of security. You always need to write code to cope with multicharacter
glyphs.

Unicode is like general relativity. No matter how well you think you
understand it, it's always more complicated than you think...

-- 
╭─┈David Given┈──McQ─╮ "There are two major products that come out of
│┈┈dg@cowlark.com┈┈┈┈│ Berkeley: LSD and Unix. We don't believe this to be
│┈(dg@tao-group.com)┈│ a coincidence." --- Jeremy S. Anderson
╰─┈www.cowlark.com┈──╯

Attachment: signature.asc
Description: OpenPGP digital signature