Re: Managing Unicode (UTF-8 and UTF-16) data in Lua

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
From: "Soni L." <fakedme@...>
Date: Fri, 5 Aug 2016 20:07:16 -0300



On 05/08/16 06:48 PM, Soni L. wrote:



On 05/08/16 05:21 PM, Paul Moore wrote:

On 5 August 2016 at 21:08, Christian N. <cn00@gmx.at> wrote:

Have a look at http://utf8everywhere.org/, especially section 10"How to do
text on Windows". That might answer your question and IMHO the whole
document is very interesting for anyone who works with encodings.
But from the top of my head, using the wide string APIs andconverting fromUTF-8 to UTF-16 is the right thing to do. Unfortunately, the os andio partsof Lua's standard library will be largely unusable for you, sinceWindowsdoes not support setting UTF-8 as ANSI codepage and neither doesMicrosoft's
C runtime (setlocale()). You will basically have to use a self-patched
version replacing calls such as fopen with their MS-specific UTF-16
equivalents such as _wfopen.

Yes. This is precisely my point, and is the approach I prefer, and use
whenever possible. I didn't make this clear, but I have a lot of
experience dealing with encoding issues, it's just that I normally
work in Python, not in Lua.

What I want to determine is the lowest-impact way of using this
approach in Lua. It's easy to use UTF-8 as the encoding for all
strings in Lua, the code is UTF-8 safe already. I'd rather not patch
the Lua C code if at all possible, so I'm looking for options to write
my own replacements for the problematic functions in os, plus the
built in print function, and patch them into the standard Lua
interpreter.

In addition, I have a mild interest (more because I'm curious than
because it'll make a massive impact to my application) in avoiding
unnecessary UTF-8 <> UTF-16 conversions, so I was wondering what would
be involved in writing a user-defined "wide character string" userdata
type, that would interoperate cleanly with Lua strings. If it's
possible to do that, I could return wide strings from API calls, which
would save 2 conversions if I simply pass the value onto another API.
But if it makes working with the return values in Lua harder, it's not
going to be worth it.

Paul

You can't make it interoperate with plain Lua strings. (e.g. no __keymetamethod, etc)

Other than that just intern the values. You cannot intern like Luadoes (e.g. don't intern long strings until they're used as a key)(again, no __key metamethod) so your strings are bound to be way moreexpensive.

Link to the thread about __key:http://lua-users.org/lists/lua-l/2016-07/msg00165.html


--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.

References:
- Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul Moore
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul K
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul Moore
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul K
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Christian N.
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul Moore
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Soni L.

Prev by Date: Re: _SELF and _SUPER
Next by Date: Re: _SELF and _SUPER
Previous by thread: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Next by thread: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Index(es):
- Date
- Thread