[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: question about Unicode
- From: David Given <dg@...>
- Date: Thu, 07 Dec 2006 14:14:02 +0000
Roberto Ierusalimschy wrote:
>> Actually dealing with shift-state dependent multi-byte encodings in a
>> portable way in C makes the infinite horrors of Unicode and UTF-8
>> seem very attractive.
> This seems a quite acurate summary of the situation.
The horrors of UTF-8 are ℵ₀, but the horrors of full Unicode are at *least* ℵ₁...
Slightly more seriously, it occurs to me that since composite characters mean
you can't rely on any individual glyph being encoded in a single Unicode
code-point, then 32-bit Unicode does, in fact, gain you nothing except a false
sense of security. You always need to write code to cope with multicharacter
Unicode is like general relativity. No matter how well you think you
understand it, it's always more complicated than you think...
╭─┈David Given┈──McQ─╮ "There are two major products that come out of
│┈┈email@example.com┈┈┈┈│ Berkeley: LSD and Unix. We don't believe this to be
│┈(firstname.lastname@example.org)┈│ a coincidence." --- Jeremy S. Anderson
Description: OpenPGP digital signature