[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: question about Unicode
- From: Mike Pall <mikelu-0612@...>
- Date: Mon, 4 Dec 2006 19:37:27 +0100
Roberto Ierusalimschy wrote:
> My question is, what is the "best" way to check char classes? Should I
> use mbtowc + iswctype? A hand-written utf8->wchat_t + iswctype?
My personal recommendation: avoid at all cost the use of the wide
character functions in libc. They are bloated, have portability
problems and are defective or incomplete in many systems. You're
basically adding a dependency that will cost you more to work
around it's deficiencies than doing it yourself.
Case in point: Python's Unicode support started out that way and
was gradually patched with workarounds and replacement functions.
Nowadays it builds by default with its own internal Unicode
support because the libc portability problems generated way too
many bug reports. The resulting compatibility layer adds
substantially to the bloat inherent in any wide-character Unicode
In contrast have a look at Klaus Ripke's slnunicode. It has small
internal decoding tables and a minimum set of functions for UTF-8
processing. It even supports Lua's pattern matching. ;-)