lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

2015-01-12 16:52 GMT+08:00 Dirk Laurie <>:
2015-01-12 5:50 GMT+02:00 tjhack <>:

> I convert some lua script(it contains chinese character) from simplified
> chinese to traditional chinese, and now the chinese character is encoding
> with cp950.
> Now I switch my win7 machine locale to zh_TW, and restart. Everything seems
> okay, the script with traditional chinese character is correct displayed.
> But when I complied these script.It is error.Invalid escape string.
> for example:
> msg="外功系普攻攻擊"
> print(msg)
> the result is:
> 外巨t普攻攻擊

If I run your code by the interactive interpreter under Ubuntu,
I get this:

$ LANG="zh_TW" lua5.1
Lua 5.1.5  Copyright (C) 1994-2012, PUC-Rio
> g3;ff.f;;f
stdin:1: '=' expected near ';'

I.e. the input is already mangled by the reader.

However, if I save the code to a file, and then run it, it is fine:

$ LANG="zh_TW" lua5.1 < /tmp/chinese.lua

Interestingly, for LuaJIT the interactive interpreter also works.
Can the culprit be the readline library (not a default option for
LuaJIT)? Let's try. Remove the line
from luaconf.h and recompile.

lua-5.1-noreadline$ LANG="zh_TW" src/lua
Lua 5.1.5  Copyright (C) 1994-2012, PUC-Rio
> msg="外功系普攻攻擊"
> print(msg)


But under Windows LUA_USE_READLINE is anyway not defined.
We must look for something else.

What happens under Windows if you run the program
as a script file instead of interactively?

It is cool to meet another Traditional Chinese Lua user here!

That's a very very famous cp950 (Big5) encoding problem called "許功蓋" issue (please kindly refer to the wikipedia!

I would say UTF-8 is definitely the way to go.

For our game, we store our Trad. Chinese and Japanese Lua scripts in UTF-8-without-BOM format (notepad++ is a very good helper to do that, even if you are not sure what encoding the current file is in), and usually if you want to feed the strings to the underlying C/C++ libraries, for instance FreeType for font rendering, that's not a problem at all if you make sure you pass the original unmangled byte stream into it.

If you happen to need to manipulate it at Lua level, utf8_simple is good enough to handle most cases:

Hope it helps!