On 5 August 2016 at 21:08, Christian N. <firstname.lastname@example.org> wrote:
Have a look at http://utf8everywhere.org/, especially section 10
"How to do
text on Windows". That might answer your question and IMHO the whole
document is very interesting for anyone who works with encodings.
But from the top of my head, using the wide string APIs and
UTF-8 to UTF-16 is the right thing to do. Unfortunately, the os and
of Lua's standard library will be largely unusable for you, since
does not support setting UTF-8 as ANSI codepage and neither does
C runtime (setlocale()). You will basically have to use a self-patched
version replacing calls such as fopen with their MS-specific UTF-16
equivalents such as _wfopen.
Yes. This is precisely my point, and is the approach I prefer, and use
whenever possible. I didn't make this clear, but I have a lot of
experience dealing with encoding issues, it's just that I normally
work in Python, not in Lua.
What I want to determine is the lowest-impact way of using this
approach in Lua. It's easy to use UTF-8 as the encoding for all
strings in Lua, the code is UTF-8 safe already. I'd rather not patch
the Lua C code if at all possible, so I'm looking for options to write
my own replacements for the problematic functions in os, plus the
built in print function, and patch them into the standard Lua
In addition, I have a mild interest (more because I'm curious than
because it'll make a massive impact to my application) in avoiding
unnecessary UTF-8 <> UTF-16 conversions, so I was wondering what would
be involved in writing a user-defined "wide character string" userdata
type, that would interoperate cleanly with Lua strings. If it's
possible to do that, I could return wide strings from API calls, which
would save 2 conversions if I simply pass the value onto another API.
But if it makes working with the return values in Lua harder, it's not
going to be worth it.