Re: question about Unicode

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: question about Unicode
From: Mike Pall <mikelu-0612@...>
Date: Mon, 4 Dec 2006 19:37:27 +0100

Hi,

Roberto Ierusalimschy wrote:
> My question is, what is the "best" way to check char classes? Should I
> use mbtowc + iswctype? A hand-written utf8->wchat_t + iswctype?

My personal recommendation: avoid at all cost the use of the wide
character functions in libc. They are bloated, have portability
problems and are defective or incomplete in many systems. You're
basically adding a dependency that will cost you more to work
around it's deficiencies than doing it yourself.

Case in point: Python's Unicode support started out that way and
was gradually patched with workarounds and replacement functions.
Nowadays it builds by default with its own internal Unicode
support because the libc portability problems generated way too
many bug reports. The resulting compatibility layer adds
substantially to the bloat inherent in any wide-character Unicode
support.

In contrast have a look at Klaus Ripke's slnunicode. It has small
internal decoding tables and a minimum set of functions for UTF-8
processing. It even supports Lua's pattern matching. ;-)

Bye,
     Mike

References:
- question about Unicode, Roberto Ierusalimschy

Prev by Date: Re: question about Unicode
Next by Date: Re: [ANN] Kepler 1.0 Released (for Lua 5.0)
Previous by thread: Re[2]: question about Unicode
Next by thread: finalizors and weak references
Index(es):
- Date
- Thread