[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: unicode support in lua
- From: David Kastrup <dak@...>
- Date: Thu, 26 Apr 2007 11:00:46 +0200
Bertrand Mansion <email@example.com> writes:
> I am new to lua and currently reading the book.
> I am wondering if lua 5.1 supports utf-8 in string handling,
> comparisons, conversions and pattern matching and things like \u in
> If not, are there plans to add utf-8 support in the future ?
There is slnunicode. Personally, I'd like to see transparent handling
of utf-8. However, this makes strings different from byte streams.
Also, it would appear prudent for reasonable handling of utf-8 strings
to be able to assume them containing only valid byte sequences which
means one needs read and write conversion functions even for files
assumed to be in utf-8 locales (in order to convert illegal byte
sequences into legal ones).
And so on. slnunicode does not actually do much in the area of
verification. If one takes a look at the input handling of Emacs, one
feature is that interpreting a file filled with random bytes as utf-8
will still preserve its contents when being rewritten unmodified.
That is because invalid input bytes get turned into special sequences
(that are in turn not considered valid sequences in a file) that get
reconverted to bytes upon writing.
It is not all too clear in my opinion how one could create a small
footprint Lua that supported byte arrays (if you want to, unibyte
strings) and multi-byte character strings where the characters
actually formed atomic string components.
Emacs actually has a flag on every string that distinguishes unibyte
and multibyte strings.
This is considered a design flaw by some, not least of all XEmacs
developers. On the other hand, XEmacs developers have been forced to
provide an XEmacs binary that supports _only_ unibyte strings in
addition to the version supporting _only_ multibyte strings, because
those that don't need multibyte strings are not willing to pay the
In short: proper utf-8 support comes at a price, and even large
closely related projects don't arrive at the same solutions.