lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

> When you are writing a surrogate-pair-symbol to Windows console
> (I've tested this on Win7 with a simple program using WriteConsoleW),
> it gets displayed as two question marks,
> that is, Windows considers it as two separate symbols instead of just one.

Windows DOES "support" surrogates -- it upgraded from UCS-2
(equivalent to UTF-16 constrained to the BMP) to UTF-16 a long time
ago (Win7, I think). But it supports them in the sense that it renders
them correctly and won't screw them up if they exist.

Do you really think Windows looks up into current font
and only if symbol's glyph is present then surrogate pair is glued into one symbol,
otherwise surrogate pair remains as two separate symbols?
Can you prove your hypothesis by an example?

I believe that splitting UTF-16 string into codepoints should not depend on current font installed.