lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Jerome - Thank you very much for your reply. I was particularly taken with your suggestion that I can let my script-writers write in UTF-16 (or whatever) and save the code as UTF-8 to pass to Lua. This idea hadn't occurred to me. Without thinking about it too carefully, I'd been assuming that Lua expects code (e.g. as supplied in a C++ call to lua_load) to be in ANSI. But is your suggestion really always guaranteed to work - even in future versions of Lua? I have to admit, I can't instantly think of any reason why it shouldn't. Sorry to press you on this, but I have read "Programming in Lua" and I have the Lua Reference Manual, and I can't find anything relevant to this in either of them. The Lua Unicode FAQ (http://lua-users.org/wiki/LuaUnicode) addresses the question of whether Lua programs can be written in Unicode, but it didn't mention anything like that. And yet your suggestion seems like the perfect solution to using Lua with Unicode. If it works, surely it's what everyone should be doing?

You asked about libraries. I use iup, cd, im, lfs, luacom, luagl - plus, sometimes, luasql, md5, and socket. Actually, the string library built into lua itself is also an issue, as much (most?) of it presumably won't work with UTF-8.

on Windows these are easy to patch (I have patches for 5.1 if you want).

Yes please.

Simon


----- Original Message ----- From: "Jerome Vuarand" <jerome.vuarand@gmail.com>
To: "Lua mailing list" <lua-l@lists.lua.org>
Sent: Thursday, July 05, 2012 12:22 PM
Subject: Re: Future plans for Lua and Unicode


2012/7/5 Simon Orde <sorde@gotadsl.co.uk>:
1. Lua scripts are currently always written in ANSI only, and probably
always will be.

Lua code itself (ie. general syntax outside of string literals) only
use a portion of the ANSI charset that is also present in other
charsets, like UTF-8. So Lua can read UTF-8. You just have to remember
that Lua strings are arrays of bytes, not arrays of characters. So the
encoding of characters to bytes is up to you.

2. Strings in Lua can be in any format you like (e.g. ANSI, UTF-8 or UTF-16)
so apps that want to support Unicode can do so by specifying that string
parameters and return values are in a Unicode encoding such as UTF-8.

3. There is currently plenty of library support for ANSI Lua strings, but no library support for working with UTF-16 Lua strings. There are one or two small libaries for doing some simple string manipulation with UTF-8 strings
(e.g. http://lua-users.org/wiki/ValidateUnicodeStringÂ; and
http://files.luaforge.net/releases/sln/slnunicode). UTF-8 is likely to have
more support in the future in Lua libraries than UTF-16.

4. IUP currently only supports ANSI Lua strings, but support for UTF-8 Lua
strings will be added soon? Is that right? Any timescales on that?

Is the above correct? Anything important I haven't mentioned?

I'm a great fan of Lua. Support for Unicode is really important for me
though. The above strategy, if correct, is probably OK for my purposes.
Ideally I'd prefer script-writers to be able to write scripts in UTF-16 and
work entirely in UTF-16 -Â but I can live without that.

With little work you can have your script-writers write in UTF-16, and
then convert that to UTF-8 on the fly (during loading with a custom
loader for example). Code will be interpreted correctly, and content
of string literals will be in UTF-8, which is a good convention IMHO.
Then all you have to do is make sure your libraries accept UTF-8
strings.

But it will only
really work when support for UTF-8 becomes available in IUP - and, ideally,
other Lua libraries. So does anyone know what work, if any, is
currently being done on adding support for UTF-8 (or UTF-16?) Lua strings in
Lua libraries - such as IUP?

Some libraries are already compatible with UTF-8. I believe on Unix
the io, os and lfs modules are already compatible, and on Windows
these are easy to patch (I have patches for 5.1 if you want).

Maybe you should list the ones that you want, and we can tell you more
precisely if they are or will be compatible with UTF-8.