lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 05/08/16 06:48 PM, Soni L. wrote:


On 05/08/16 05:21 PM, Paul Moore wrote:
On 5 August 2016 at 21:08, Christian N. <cn00@gmx.at> wrote:
Have a look at http://utf8everywhere.org/, especially section 10 "How to do
text on Windows". That might answer your question and IMHO the whole
document is very interesting for anyone who works with encodings.

But from the top of my head, using the wide string APIs and converting from UTF-8 to UTF-16 is the right thing to do. Unfortunately, the os and io parts of Lua's standard library will be largely unusable for you, since Windows does not support setting UTF-8 as ANSI codepage and neither does Microsoft's
C runtime (setlocale()). You will basically have to use a self-patched
version replacing calls such as fopen with their MS-specific UTF-16
equivalents such as _wfopen.
Yes. This is precisely my point, and is the approach I prefer, and use
whenever possible. I didn't make this clear, but I have a lot of
experience dealing with encoding issues, it's just that I normally
work in Python, not in Lua.

What I want to determine is the lowest-impact way of using this
approach in Lua. It's easy to use UTF-8 as the encoding for all
strings in Lua, the code is UTF-8 safe already. I'd rather not patch
the Lua C code if at all possible, so I'm looking for options to write
my own replacements for the problematic functions in os, plus the
built in print function, and patch them into the standard Lua
interpreter.

In addition, I have a mild interest (more because I'm curious than
because it'll make a massive impact to my application) in avoiding
unnecessary UTF-8 <> UTF-16 conversions, so I was wondering what would
be involved in writing a user-defined "wide character string" userdata
type, that would interoperate cleanly with Lua strings. If it's
possible to do that, I could return wide strings from API calls, which
would save 2 conversions if I simply pass the value onto another API.
But if it makes working with the return values in Lua harder, it's not
going to be worth it.

Paul

You can't make it interoperate with plain Lua strings. (e.g. no __key metamethod, etc)

Other than that just intern the values. You cannot intern like Lua does (e.g. don't intern long strings until they're used as a key) (again, no __key metamethod) so your strings are bound to be way more expensive.

Link to the thread about __key: http://lua-users.org/lists/lua-l/2016-07/msg00165.html

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.