[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Future plans for Lua and Unicode
- From: Jerome Vuarand <jerome.vuarand@...>
- Date: Fri, 6 Jul 2012 10:33:02 +0100
2012/7/5 Simon Orde <email@example.com>:
> Jerome - Thank you very much for your reply. I was particularly taken with
> your suggestion that I can let my script-writers write in UTF-16 (or
> whatever) and save the code as UTF-8 to pass to Lua. This idea hadn't
> occurred to me. Without thinking about it too carefully, I'd been assuming
> that Lua expects code (e.g. as supplied in a C++ call to lua_load) to be in
> ANSI. But is your suggestion really always guaranteed to work - even in
> future versions of Lua?
Nobody knows what will happen in the future. But given Lua's track
record, you can expect the feature to stay in one form or another.
> I have to admit, I can't instantly think of any
> reason why it shouldn't. Sorry to press you on this, but I have read
> "Programming in Lua" and I have the Lua Reference Manual, and I can't find
> anything relevant to this in either of them.
I believe the following quote from section 2.1 about literal strings
is what let you put anything (including UTF-8) in strings : "Strings
in Lua can contain any 8-bit value".
> The Lua Unicode FAQ
> (http://lua-users.org/wiki/LuaUnicode) addresses the question of whether Lua
> programs can be written in Unicode, but it didn't mention anything like
> that. And yet your suggestion seems like the perfect solution to using Lua
> with Unicode. If it works, surely it's what everyone should be doing?
I believe it's already what most people do.
> You asked about libraries. I use iup, cd, im, lfs, luacom, luagl - plus,
> sometimes, luasql, md5, and socket. Actually, the string library built into
> lua itself is also an issue, as much (most?) of it presumably won't work
> with UTF-8.
The string library will count bytes, not characters. But anyway what
is a naively considered a character doesn't have a unique translation
within a Unicode string. The codepoint is the basic unit in Unicode,
but you may have glyphs composed with several codepoints (for a
example a naked roman vowel codepoint followed by an accent
codepoint). What basic unit you want to use (byte, codepoint or
something larger) depends on your application.
>> on Windows these are easy to patch (I have patches for 5.1 if you want).
> Yes please.
For Lua 5.1.4 (I use several patches, you may need to adapt that one) :
For LuaFileSystem 1.2.1 :