lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 22-May-05, at 5:42 PM, Klaus Ripke wrote:

On Sun, May 22, 2005 at 03:05:09PM -0700, whisper@oz.net wrote:
http://www-306.ibm.com/software/globalization/icu/index.jsp

You're right, those that really need to have the full
story like support for each and all normal forms with
all special cases in all locales and multi level sorting
and whatnot should consider linking ICU, as it's fairly
complete and efficient.

But where size does matter, it's two orders of magnitude to fat.

It's pretty big for an embedded system, that's for sure. However, the reference data is all constant static data, and ICU goes to a fair amount of trouble to ensure that only one copy is ever loaded into memory. So if you have an OS which supports mmap, and any other application uses ICU, then the cost of the ICU reference data is 0, and the cost of any other form of reference data is > 0. Consequently, not supporting ICU may actually increase resource demands :)

R.

By the way, re: composition and decomposition normalization. For round tripping between unicode and ISO-8859-1, decomposition is probably not the way to go. However, for any other purpose, I think it is: even though the text is slightly bulkier, the normalization algorithm is somewhat easier, although I do have a nifty and fairly inexpensive composition normalizer.