lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Sun, Aug 7, 2016 at 1:38 PM, Egor Skriptunoff
<> wrote:
>> > When you are writing a surrogate-pair-symbol to Windows console
>> > (I've tested this on Win7 with a simple program using WriteConsoleW),
>> > it gets displayed as two question marks,
>> > that is, Windows considers it as two separate symbols instead of just
>> > one.
>> Windows DOES "support" surrogates -- it upgraded from UCS-2
>> (equivalent to UTF-16 constrained to the BMP) to UTF-16 a long time
>> ago (Win7, I think). But it supports them in the sense that it renders
>> them correctly and won't screw them up if they exist.
> Do you really think Windows looks up into current font
> and only if symbol's glyph is present then surrogate pair is glued into one
> symbol,
> otherwise surrogate pair remains as two separate symbols?
> :-)
> Can you prove your hypothesis by an example?
> I believe that splitting UTF-16 string into codepoints should not depend on
> current font installed.

Rendering as a single box instead of two boxes would be "correct"
rendering for a font missing the glyph, so the current font doesn't
come into the picture at all. In the general case, Windows does this
correctly in GUI applications. Therefore, it's accurate to say that
Windows supports UTF-16.

As you pointed out, this isn't true for WriteConsoleW. This means that
the Windows console is still stuck in UCS-2 land.

Windows and the Windows console aren't the same thing. Just because
you can't display a UTF-16 string in the console doesn't mean that the
underlying OS doesn't support UTF-16.

That said, comparing Windows's UTF-16 support to Lua's UTF-8 support
is actually pretty similar. Both languages assert that strings are a
series of code units instead of a series of code points and rely on
the developer to handle them correctly by not using the naive
functions and [] indexing but instead by using library functions.
Lua's library is a little better than Windows's, but in practice if
you want to do actual text processing in either case you probably want
to use a third-party libraries instead of the stock provided tools.

/s/ Adam