lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Thanks for your patch, Jerome.
I've used it and found it is a minimal and elegant support for UTF16 unicode strings. Although it provide the requirements I wrote in my post, I found it does not work for me because it does not support ability to run LUA code from UTF16 files or lines. So while you support unicode strings you can't really define unicode literals in an easy way.
Thanks!
           Uri Cohen

On Mon, Oct 19, 2009 at 11:25 AM, Jerome Vuarand <jerome.vuarand@gmail.com> wrote:
2009/10/16 uri cohen <uri.cohen@gmail.com>:
> Following the feedback that using UTF8 is better than porting LUA to UTF16,
> is there a LUA port or library which allows the user to:
> - define UTF8 literals;
> - open files with UTF8 names;
> - read and write Windows unicode text using only LUA strings
> - bind with external C code which work in UTF16

I wrote such a patch, it is attached to this mail. It adds the
following to the Lua API, declared in standard lua headers but
implemented in lwstring.c:

#define LUA_HAS_WSTRING 1

LUA_API const wchar_t *(lua_towstring) (lua_State *L, int index);
LUA_API const wchar_t *(lua_tolwstring) (lua_State *L, int index,
size_t *length);
LUA_API void (lua_pushwstring) (lua_State *L, const wchar_t *value);
LUA_API void (lua_pushlwstring) (lua_State *L, const wchar_t *value,
size_t size);

LUALIB_API const wchar_t *(luaL_optwstring) (lua_State *L, int index,
const wchar_t *def);
LUALIB_API const wchar_t *(luaL_checkwstring) (lua_State *L, int index);
LUALIB_API const wchar_t *(luaL_checklwstring) (lua_State *L, int
index, size_t *length);
LUALIB_API int  (luaL_loadwfile) (lua_State *L, const wchar_t *filename);

#define luaL_dowfile(L, fn)     \
       (luaL_loadwfile(L, fn) || lua_pcall(L, 0, LUA_MULTRET, 0))

A "wstring" is a userdata which contains a NULL-terminated string of
wchar_t. lua_towstring converts a string on the stack to a wide char
string. It does replace the string on the stack, similarly to
lua_tostring and lua_tonumber when they are passed a number or a
string respectively, so the same restrictions apply (don't use on keys
in a lua_next loop).

On windows these functions use MultiByteToWideChar and
WideCharToMultiByte to convert between utf-8 and utf-16. On other
platforms the functions use mbstowcs and wcstombs, which convert
between the current multi-byte locale and wide strings, usually ucs-4.

On windows only, the patch modifies all calls to the C library that
uses filenames, and convert these filenames using wstring conversion
functions.

It's only been tested on windows so far (both the
MultiByteToWideChar/WideCharToMultiByte and mbstowcs/wcstombs versions
compile there). It's not perfect (I think it may leak if a
lua_pushstring errors), and I didn't patch the Makefile since I'm not
using it. Any feedback is welcome.