lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



I'm trying to embed Lua in a Windows program that needs to be "Unicode
clean".
 ...
1. The print() function doesn't handle UTF-8

Why do you want to use UTF-8 encoding on Windows?
IMO, there is absolutely no reason to use UTF-8 on Windows-only applications.
BTW, I don't know if it is possible to display arbitrary Unicode symbol in Windows console,
so maybe the wish of making Unicode version of print() is unattainable anyway.

All you really need is UTF-16LE strings (which are used by W-functions of WinAPI).
But there are some problems to solve:


Problem #1: most of string library functions work incorrectly with UTF-16.
Solution: 
You should write your own implementation of UTF-16 string library functions:
string16.sub(), string16.gmatch(), string16.upper() and so on if you need them.


Problem #2: io.popen() generates output in cp850.
Solution:
You can use Lua standard function io.popen() to get UTF-16 output.
Windows does support Unicode output for all internal commands by prefixing commands with "cmd /u/d/c"
For example, the following code gets Unicode filenames as one UTF-16 string:

local cmd1 = [[echo List#1]]
local cmd2 = [[dir /b]]
local cmd3 = [[echo List#2]]
local cmd4 = [[dir "C:\Program Files" /b]]
local cmd = [["cmd /u/d/c "]]..cmd1.."&"..cmd2.."&"..cmd3.."&"..cmd4..[[""]]
-- cmd here is Lua string in 1-byte encoding (win1252)
local output16 = io.popen(cmd, "rb"):read"*a"
-- output16 here is Lua string containing UTF-16 symbols
-- for example, the "Euro" symbol (U+20AC) in output16 will be written as "\xAC\x20".
-- lines in output16 are separated by "\r\0\n\0"


Problem #3: os.execute() and io.popen() accept command line in win1252
Solution:
Write your own implementation that accepts UTF-16 command line.


Problem #4: os.getenv() generates output in win1252.
Solution:

function getenv16 (env_var_name)
   -- this is UTF-16 analogue of os.getenv()
   -- env_var_name must be in 1-byte encoding (win1252)
   -- returns UTF-16 string (or nil if the variable is not defined)
   local cmd = [["cmd /u/d/c "if defined ]]..env_var_name..[[ echo %]]..env_var_name..[[%""]]
   local pipe = io.popen(cmd, "rb")
   local result = pipe:read"*a"
   pipe:close()
   return result ~= "" and result:gsub("\r%z\n%z$", "") or nil
end