[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Lua 5.1 and UTF-8 ?
- From: Rici Lake <lua@...>
- Date: Mon, 23 May 2005 00:26:58 -0500
On 22-May-05, at 5:42 PM, Klaus Ripke wrote:
On Sun, May 22, 2005 at 03:05:09PM -0700, email@example.com wrote:
You're right, those that really need to have the full
story like support for each and all normal forms with
all special cases in all locales and multi level sorting
and whatnot should consider linking ICU, as it's fairly
complete and efficient.
But where size does matter, it's two orders of magnitude to fat.
It's pretty big for an embedded system, that's for sure. However, the
reference data is all constant static data, and ICU goes to a fair
amount of trouble to ensure that only one copy is ever loaded into
memory. So if you have an OS which supports mmap, and any other
application uses ICU, then the cost of the ICU reference data is 0, and
the cost of any other form of reference data is > 0. Consequently, not
supporting ICU may actually increase resource demands :)
By the way, re: composition and decomposition normalization. For round
tripping between unicode and ISO-8859-1, decomposition is probably not
the way to go. However, for any other purpose, I think it is: even
though the text is slightly bulkier, the normalization algorithm is
somewhat easier, although I do have a nifty and fairly inexpensive