[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Future plans for Lua and Unicode
- From: "Simon Orde" <sorde@...>
- Date: Thu, 5 Jul 2012 16:32:55 +0100
Jerome - Thank you very much for your reply. I was particularly taken with
your suggestion that I can let my script-writers write in UTF-16 (or
whatever) and save the code as UTF-8 to pass to Lua. This idea hadn't
occurred to me. Without thinking about it too carefully, I'd been assuming
that Lua expects code (e.g. as supplied in a C++ call to lua_load) to be in
ANSI. But is your suggestion really always guaranteed to work - even in
future versions of Lua? I have to admit, I can't instantly think of any
reason why it shouldn't. Sorry to press you on this, but I have read
"Programming in Lua" and I have the Lua Reference Manual, and I can't find
anything relevant to this in either of them. The Lua Unicode FAQ
(http://lua-users.org/wiki/LuaUnicode) addresses the question of whether Lua
programs can be written in Unicode, but it didn't mention anything like
that. And yet your suggestion seems like the perfect solution to using Lua
with Unicode. If it works, surely it's what everyone should be doing?
You asked about libraries. I use iup, cd, im, lfs, luacom, luagl - plus,
sometimes, luasql, md5, and socket. Actually, the string library built into
lua itself is also an issue, as much (most?) of it presumably won't work
on Windows these are easy to patch (I have patches for 5.1 if you want).
----- Original Message -----
From: "Jerome Vuarand" <firstname.lastname@example.org>
To: "Lua mailing list" <email@example.com>
Sent: Thursday, July 05, 2012 12:22 PM
Subject: Re: Future plans for Lua and Unicode
2012/7/5 Simon Orde <firstname.lastname@example.org>:
1. Lua scripts are currently always written in ANSI only, and probably
always will be.
Lua code itself (ie. general syntax outside of string literals) only
use a portion of the ANSI charset that is also present in other
charsets, like UTF-8. So Lua can read UTF-8. You just have to remember
that Lua strings are arrays of bytes, not arrays of characters. So the
encoding of characters to bytes is up to you.
2. Strings in Lua can be in any format you like (e.g. ANSI, UTF-8 or
so apps that want to support Unicode can do so by specifying that string
parameters and return values are in a Unicode encoding such as UTF-8.
3.Â There is currently plenty of library support for ANSI Lua strings, but
library support for working withÂ UTF-16 Lua strings.Â There are one or
small libaries for doing some simple string manipulation withÂ UTF-8
(e.g. http://lua-users.org/wiki/ValidateUnicodeStringÂ; and
http://files.luaforge.net/releases/sln/slnunicode).Â UTF-8 is likely to
more support in the future in Lua libraries than UTF-16.
4. IUP currently only supports ANSI Lua strings, but support for UTF-8 Lua
strings will be added soon?Â Is that right?Â Any timescales on that?
Is the above correct?Â Anything important I haven't mentioned?
I'm a great fan of Lua.Â Support for Unicode is really important for me
though.Â The above strategy, if correct, is probably OK for my purposes.
Ideally I'd prefer script-writers to be able to write scripts in UTF-16
work entirely in UTF-16 -Â but I can live without that.
With little work you can have your script-writers write in UTF-16, and
then convert that to UTF-8 on the fly (during loading with a custom
loader for example). Code will be interpreted correctly, and content
of string literals will be in UTF-8, which is a good convention IMHO.
Then all you have to do is make sure your libraries accept UTF-8
But it will only
really workÂ when support for UTF-8 becomes available in IUP - and,
other Lua libraries.Â So does anyone know whatÂ work, if any, is
currentlyÂ being done on adding support for UTF-8 (or UTF-16?) Lua strings
Lua libraries - such as IUP?
Some libraries are already compatible with UTF-8. I believe on Unix
the io, os and lfs modules are already compatible, and on Windows
these are easy to patch (I have patches for 5.1 if you want).
Maybe you should list the ones that you want, and we can tell you more
precisely if they are or will be compatible with UTF-8.