lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Thanks you guys for your feedback on porting LUA to UTF16.

but To my knowledge MS wide character support DOES support wide chars of more than 2 bytes (the so-called surrogates characters) indeed, my approach makes it hard to use Lua strings to represent 'arbitrary binary data'. I did it in order to embbed LUA in a native UTF16 application.

The LuaPlus approach is interesting, but having two string types in LUA makes it hard to maintain- it will break in runtime whenever passing a string to a code which does not expect the a string, while the UTF16 approach will identify those problems in compile time.
Nevertheless, I will give LuaPlus further thought.Thanks,
            Uri Cohen

On Thu, Oct 15, 2009 at 5:03 AM, Joshua Jensen <> wrote:
----- Original Message -----
From: David Given
Date: 10/14/2009 8:31 PM
Joshua Jensen wrote:
LuaPlus achieves this via a C-like string representation:

HelloWorld = L"Hello world!"
What does LuaPlus do for things like string comparison and surrogates?
It does the equivalent of wcscmp(), only it doesn't rely on the C runtime to achieve this.  That's because on some non-Visual C++ compilers, sizeof(wchar_t) != 2.  sizeof(lua_WChar) is always 2.

My understanding is that UCS-2 doesn't support surrogates.  I don't think Microsoft's C runtime wide character library supports them either.  I could be wrong.

Do any of them use UTF-8?

I work in mobile games; our company makes a portable native gaming solution that allows you to install C-based games on any device, regardless of architecture. The API's based on OpenKODE, which uses UTF-8 in the few places where it uses strings. As I tend to do the bottom-end porting to weird and freaky embedded operating systems, I've got tiresomely familiar with having to translate UTF-8 to whatever encoding the host OS uses. There are a surprising number that use some form of half-assed UCS-2, and I've never figured out why --- it just makes life complex. I suspect that it's simple tradition. Most of them come from Asia, and Asia seems to have a culture of using UCS-2 or UTF-16...
I would consider ditching the LuaPlus wide character support if there was a small library that supported UTF-8 and allowed easy embedding of UTF-8 string types in Lua source files.

Have you looked at slnunicode?  That seems to be the smallest one I can find, but documentation is scarce, so I don't know if it achieves all of the goals.



           Uri Cohen