lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Jun 11, 2003 at 04:12:21PM -0300, Roberto Ierusalimschy wrote:
> > But two identical utf-8 characters can have different encoding, right? 
> No. I mean, if they have the same unicode number, they must have the
> same utf-8 encoding.

If I recall correctly, the same glyph, however, may have multiple encodings
unless you stick to a sensible subset of Unicode.

It would be nice to have a UTF8 string replacement library and writing
versions of string.sub etc. that support utf8 should be a trivial task.
However, writing a version of the regular expression matcher may be a
bigger task. In the meanwhile, there are Unicode and possibly UTF8-aware
POSIX regular expression matchers, though, so maybe it would be possible
to convert one Lua POSIX regex libraries to use one of those? See e.g.
<http://linuxselfhelp.com/HOWTO/Unicode-HOWTO-6.html>.

-- 
Tuomo