lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sat, Jun 14, 2003 at 01:27:45PM +0100, chris.danx wrote:
> I'd like to be able to pass utf-8 strings to lua and 
> have it compare them and to potentially have lua code in utf-8.  How 
> easy is it to use lua with utf-8 and multibyte encoding sets?

Lua being 8-bit clean (IIRC), UTF-8 in Lua strings should be no problem
as the language itself uses only ASCII. Some of the string library
functions should be rewritten to partially support utf-8 (string indexing,
regexp character classes), but even without that support, UTF-8 should
work decently. I'm not an expert on Unicode (far from that!), but full
Unicode and string comparison support might need a lot of work.
<ot-rant>They should just have sticked to mapping basic glyphs to numbers
and leave the rest to higher-level formats.</ot-rant> However, if you
stick to a nice subset of the standard (implementation level 1?),
string _equality_ comparison should work fine. I think Lua uses C strcmp
for comparing strings. The results of this function depend on the user's
locale so string sorting may not work as expected.

If you keep these limitations in mind, there should be no problems. Some
time ago I converted Ion to optionally use UTF-8 internally (and later
to use Lua for configuration) and the only problems I've had are thanks
to broken Xutf8 functions/locale support brain-damagedness. (Using X and
libc multibyte support functions might have been a better solution
otherwise, but I need to parse the strings backwards fast and this is
not possible with general encodings.)

Parts of the following page might prove to be helpfull even if you're
not targeting *nix: <http://www.cl.cam.ac.uk/~mgk25/unicode.html>

-- 
Tuomo