[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Unicode
- From: Björn De Meyer <bjorn.demeyer@...>
- Date: Fri, 24 May 2002 21:25:42 +0200
Roberto Ierusalimschy wrote:
> But I guess the easiest way to use Unicode in Lua is with a multibyte
> representation (e.g. UTF-8). Then, you mainly (only?) need a new string
> library; everything else should work without modifications.
Yes, this UTF-8 is most definitely the most straightformard way
to support "Unicode" in Lua. UTF-8 has some very interesting qualities,
such as the lack of embedded null characters, and being backwards
compatible with ASCII-7 (but not with 8-bit character sets.) You can
use the same string functions you use on regular 8 bit strings to work
with UTF-8 strings. The main difficulty with UTF-8 is that one
character may be 1, 2, 3 or 4 bytes long.
IMO, the following Lua stringg function should already be
UTF-8 compatible (if they have been implmented cleanly):
* strfind (s, pattern [, init [, plain]])
* strlower (s)
* strupper (s)
* strrep (s, n)
* format (formatstring, e1, e2, ...)
* gsub (s, pat, repl [, n])
The following would probably need to be altered :
* strbyte (s [, i]): When in an UTF8 locale, strbyte should
not return the i-the byte, but the i-th character in s, as
in UTF8, 1 character may .
* strchar (i1, i2, ...): In UTF8 locale, this should translate
i1, i2, etc if they have a value above 127 to the corresponding
* strlen (s): In UTF-8 this should count the amounnt of characters,
not the amoubnt of bytes.
* strsub (s, i [, j]): Again I and J should be able to be expressed
as character counts, not as byte indexes.
However, Lua is so flexible, that I think it would be possible
to implement these modifications in Lua itself. It's been quite a
while since I worked with UTF8, but if there is more interest,
I might be willing to cooperate to get this integrated into lua.
"No one knows true heroes, for they speak not of their greatness." --
Björn De Meyer