lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2012/7/5 Simon Orde <sorde@gotadsl.co.uk>:
> Jerome - Thank you very much for your reply.  I was particularly taken with
> your suggestion that I can let my script-writers write in UTF-16 (or
> whatever) and save the code as UTF-8 to pass to Lua.  This idea hadn't
> occurred to me.  Without thinking about it too carefully, I'd been assuming
> that Lua expects code (e.g. as supplied in a C++ call to lua_load) to be in
> ANSI.  But is your suggestion really always guaranteed to work - even in
> future versions of Lua?

Nobody knows what will happen in the future. But given Lua's track
record, you can expect the feature to stay in one form or another.

> I have to admit, I can't instantly think of any
> reason why it shouldn't.  Sorry to press you on this, but I have read
> "Programming in Lua" and I have the Lua Reference Manual, and I can't find
> anything relevant to this in either of them.

I believe the following quote from section 2.1 about literal strings
is what let you put anything (including UTF-8) in strings : "Strings
in Lua can contain any 8-bit value".

> The Lua Unicode FAQ
> (http://lua-users.org/wiki/LuaUnicode) addresses the question of whether Lua
> programs can be written in Unicode, but it didn't mention anything like
> that.  And yet your suggestion seems like the perfect solution to using Lua
> with Unicode.  If it works, surely it's what everyone should be doing?

I believe it's already what most people do.

> You asked about libraries.  I use iup, cd, im, lfs, luacom, luagl - plus,
> sometimes, luasql, md5, and socket.  Actually, the string library built into
> lua itself is also an issue, as much (most?) of it presumably won't work
> with UTF-8.

The string library will count bytes, not characters. But anyway what
is a naively considered a character doesn't have a unique translation
within a Unicode string. The codepoint is the basic unit in Unicode,
but you may have glyphs composed with several codepoints (for a
example a naked roman vowel codepoint followed by an accent
codepoint). What basic unit you want to use (byte, codepoint or
something larger) depends on your application.

>> on Windows these are easy to patch (I have patches for 5.1 if you want).
>
> Yes please.

For Lua 5.1.4 (I use several patches, you may need to adapt that one) :

https://bitbucket.org/doub/electronmeat/src/420797026c2d/srcweb/lua-5.1.4/lua-wstring.patch

For LuaFileSystem 1.2.1 :

https://bitbucket.org/doub/electronmeat/src/420797026c2d/srcweb/luafilesystem-1.2.1/win32-utf-8.patch