Unicode Identifers

lua-users home
wiki

Difference (from prior author revision) (no other diffs)

Changed: 6,8c6,8
* *NIX users can simply use setlocale to set the locale to a UTF-8 locale and it will work. (If no UTF_8 locale is avaible it wont work)
* Windows does not support UTF-8 locales and then there is the overhead of using UCS-2 (char -> wchar_t). Additionally big parts of Lua would have to be changed to use TCHAR and the associated functions.
* Lua users in general can use the table syntax _G["some utf8 characters"] (it would work but this is cumbersome)
* *NIX users can simply use setlocale to set the locale to a UTF-8 locale and it will work. (If no UTF-8 locale is available, it won't work.)
* Windows does not support UTF-8 locales and then there is the overhead of using UCS-2 (char -> wchar_t). Additionally big parts of Lua would have to be changed to use TCHAR and the associated functions.
* Lua users in general can use the table syntax _G["some utf8 characters"] (it would work but this is cumbersome)

Changed: 14c14
// all utf-8 chars are always alphabetic character (everthing higher then
// all utf-8 chars are always alphabetic character (everything higher then

Changed: 28c28
Please note that all unicode characters will be allowed (this maybe a problem with characters similar looking to Lua keywords, operators and whitespace)
Please note that all Unicode characters will be allowed. (this is maybe a problem, because some characters look similar to Lua keywords, operators and whitespace.)

Platform independent approach to Unicode literals in Lua.

Situation without this change:

Add this to the section Local configuration (luaconf.h):

#ifdef LUA_CORE
// all utf-8 chars are always alphabetic character (everything higher then
// 2^7 is always a valid char), end of stream (-1) is not valid
#define isalpha(zeich) (((0x80&zeich)||isalpha(zeich))&&zeich!=-1)
// all utf-8 chars are always alphabetic character or numbers, end of
// stream (-1) is not valid
#define isalnum(zeich) (((0x80&zeich)||isalnum(zeich))&&zeich!=-1)
// all utf-8 chars are never numbers, end of stream (-1) is not valid
#define isdigit(zeich) ((!(0x80&zeich)&&isdigit(zeich))&&zeich!=-1)
// all utf-8 chars are never whitespace characters, end of stream (-1) is
// not valid
#define isspace(zeich) ((!(0x80&zeich)&&isspace(zeich))&&zeich!=-1)
#endif

Please note that all Unicode characters will be allowed. (this is maybe a problem, because some characters look similar to Lua keywords, operators and whitespace.)

Then recompile Lua and try these samples:

local function Grüße(message)
    print(message)
end

GrüßeAusDeutschland = "Hallo Welt äöüß"
   -- As you see we are using a global variable with UTF-8 characters
Grüße(GrüßeAusDeutschland) -- call to local function with UTF-8 characters

-- Please add other Language samples here ;)

-- Just to prove my point some google translate gibberish:

日本からのご挨拶 = "ハローワールド" -- japanese
Grüße(日本からのご挨拶)

HendrikPolczynski 2009-11-20 - First Revision

For Unicode String support and validation please see:

For general information about Unicode in Lua see:


RecentChanges · preferences
edit · history
Last edited February 22, 2018 4:49 am GMT (diff)