lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Aug 7, 2016 at 1:21 PM, Coda Highland <chighland@gmail.com> wrote:
> On Sun, Aug 7, 2016 at 7:59 AM, Egor Skriptunoff
> <egor.skriptunoff@gmail.com> wrote:
>>
>>> > Operations on fixed width character strings (such as UTF-16) are
>>> > processed faster.
>>>
>>> UTF-16 isn't fixed char width.
>>
>>
>> Yes, you are absolutely correct.
>> UTF-16 uses surrogate pairs to represent codepoints above 0x10000.
>> But Windows does not support them.
>> When you are writing a surrogate-pair-symbol to Windows console
>> (I've tested this on Win7 with a simple program using WriteConsoleW),
>> it gets displayed as two question marks,
>> that is, Windows considers it as two separate symbols instead of just one.
>>
>> If Windows does not support surrogate pairs, why should we?
>> That's why we can treat UTF-16 on Windows as fixed-char-width encoding.
>>
>> Of course, this means that 100% correct Unicode "print()" function is
>> non-implementable for Windows console applications.
>>
>
> Windows DOES "support" surrogates -- it upgraded from UCS-2
> (equivalent to UTF-16 constrained to the BMP) to UTF-16 a long time
> ago (Win7, I think). But it supports them in the sense that it renders
> them correctly and won't screw them up if they exist. The support is
> roughly equivalent to Lua's UTF-8 support: if you know what you're
> doing and you explicitly ask for it, then it can deal with it, but if
> you just use the naive wide-string functions it'll treat them as
> multiple characters.
>
> /s/ Adam

Though I should clarify: WINDOWS supports it, but the Windows CONSOLE
does not; I don't mean to argue with Egor's comment regarding
WriteConsoleW.

/s/ Adam