Re: Managing Unicode (UTF-8 and UTF-16) data in Lua

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
From: Coda Highland <chighland@...>
Date: Sun, 7 Aug 2016 13:22:15 -0700

On Sun, Aug 7, 2016 at 1:21 PM, Coda Highland <chighland@gmail.com> wrote:
> On Sun, Aug 7, 2016 at 7:59 AM, Egor Skriptunoff
> <egor.skriptunoff@gmail.com> wrote:
>>
>>> > Operations on fixed width character strings (such as UTF-16) are
>>> > processed faster.
>>>
>>> UTF-16 isn't fixed char width.
>>
>>
>> Yes, you are absolutely correct.
>> UTF-16 uses surrogate pairs to represent codepoints above 0x10000.
>> But Windows does not support them.
>> When you are writing a surrogate-pair-symbol to Windows console
>> (I've tested this on Win7 with a simple program using WriteConsoleW),
>> it gets displayed as two question marks,
>> that is, Windows considers it as two separate symbols instead of just one.
>>
>> If Windows does not support surrogate pairs, why should we?
>> That's why we can treat UTF-16 on Windows as fixed-char-width encoding.
>>
>> Of course, this means that 100% correct Unicode "print()" function is
>> non-implementable for Windows console applications.
>>
>
> Windows DOES "support" surrogates -- it upgraded from UCS-2
> (equivalent to UTF-16 constrained to the BMP) to UTF-16 a long time
> ago (Win7, I think). But it supports them in the sense that it renders
> them correctly and won't screw them up if they exist. The support is
> roughly equivalent to Lua's UTF-8 support: if you know what you're
> doing and you explicitly ask for it, then it can deal with it, but if
> you just use the naive wide-string functions it'll treat them as
> multiple characters.
>
> /s/ Adam

Though I should clarify: WINDOWS supports it, but the Windows CONSOLE
does not; I don't mean to argue with Egor's comment regarding
WriteConsoleW.

/s/ Adam

References:
- Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul Moore
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Egor Skriptunoff
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul Moore
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Egor Skriptunoff
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Scott Morgan
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Egor Skriptunoff
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Coda Highland

Prev by Date: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Next by Date: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Previous by thread: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Next by thread: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Index(es):
- Date
- Thread