lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



> Operations on fixed width character strings (such as UTF-16) are
> processed faster.

UTF-16 isn't fixed char width.
 
Yes, you are absolutely correct.
UTF-16 uses surrogate pairs to represent codepoints above 0x10000.
But Windows does not support them.
When you are writing a surrogate-pair-symbol to Windows console
(I've tested this on Win7 with a simple program using WriteConsoleW),
it gets displayed as two question marks,
that is, Windows considers it as two separate symbols instead of just one.

If Windows does not support surrogate pairs, why should we?
That's why we can treat UTF-16 on Windows as fixed-char-width encoding.

Of course, this means that 100% correct Unicode "print()" function is
non-implementable for Windows console applications.