lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


David Kastrup wrote:
Apropos: could someone clue me in what the proposed way of operation
is for dealing with utf-8 strings?  When one is using lua as an
embedded interpreter, having efficient strings with a natural Unicode
character type (internally represented with utf-8) would save a lot of
headaches.
When working on games that were localized into far eastern languages, LuaPlus offered us a 16-bit wide character string type that accomplished our needs nicely. I do not claim it is UCS- or UTF- compatible, but it was sufficient for our localization needs. However, it enabled us to deal with strings in the following forms:

str = L"\x30A0\x30A1\x30A2"  -- Katakana letters
str2 = L"\x30A3\x30A4\x30A5"  -- Katakana letters
str = str .. str2  -- Concat works
print(str)  -- Works
-- The entire Lua string library was also found as a 16-bit wstring library. File reading understands the 16-bit format, too. We'd use it to read "Unicode" .csv files from Excel.

In addition, LuaPlus could read its input files as 16-bit entities. It didn't allow 16-bit identifiers in Lua, but it did let you directly put the Katakana letters (or whatever) directly in strings and not map them through a hexadecimal equivalent.

Josh