[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: question about Unicode
- From: Glenn Maynard <glenn@...>
- Date: Tue, 5 Dec 2006 06:34:46 -0500
On Mon, Dec 04, 2006 at 04:49:11PM -0200, Roberto Ierusalimschy wrote:
> > The way slnunicode does it is optimized for size,
> > using a higly compressed unicode character class table (from Tcl)
> > and never requiring space for a UTF-16 version (unlike Tcl).
> What is the license? Where can I find documentation?
As an aside, it should be noted that the notion of UTF-8 being smaller
than UTF-16 (or UCS-2) is a very Western-centric idea. It's variable-
width, so it's only smaller for languages where most letters take
one byte in UTF-8; Asian languages typically take three, so UTF-8
is 50% larger. (No prejudice in UTF-8's design here--CJK just has
too many characters!) Arabic breaks even, I think.
Just a note, not an argument against UTF-8--not being able to desync
the stream by losing a byte, endianness-independence, and scaling to
high Unicode ranges cleanly are good reasons to use it, too.