[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: UTF-8 and the Windows console
- From: Matthias Kluwe <mkluwe@...>
- Date: Tue, 31 Mar 2009 16:26:43 +0200
Hi!
2009/3/31 David Given <dg@cowlark.com>:
> Matthias Kluwe wrote:
> [...]
>> So, hopefully someone on this list has insight about the behavior of
>> the MS Windows (XP) console regarding UTF-8 encoded data.
>>
>> The windows console does work with UTF-8 if the “codepage” is set to
>> the value 65001, apparently (command chcp 65001). I used lue5.1.exe
>> from luabinaries.luaforge.net for a test (and a self-built lua.exe)
>> with the command
> [...]
>
> - - The console doesn't support any kind of font substitution. If you try
> to render a glyph that's not supported by your current font, you'll get
> a dummy glyph instead. The standard bitmap font supports ASCII-and-a-bit
> only. Here's some instructions on how to use a real font:
Well, _displaying_ things is not my problem, fortunately...
> - - The libc's stdio is not 8-bit clean and I've seen reports that it can
> mangle binary data. If you're using UTF-8 this can cause you to generate
> invalid sequences, which might be what's causing your app to fail.
This may be possible. But two facts drive me crazy:
1) 'echo é > out.txt' writes the correct bytes UTF-8 encoded to 'out.txt'.
2) Reading an UTF-8 encoded file works, using stdio or C++'s std functionality.
This looks quite inconsistent to the observed behavior when reading
from stdin...
> The only actually reliable way I've found of getting Unicode to the
> console is using the WriteConsoleOutputW() functions, which of course
> don't work with redirectable streams... and if you thought ncurses had
> an ugly API, you haven't seen anything yet.
Hmm, this is not nice, really, if you're right. Anyway, until now I
have no problems getting Unicode _to_ the console...
> This URL seems to describe what you're seeing:
>
> http://mail.python.org/pipermail/python-list/2003-April/200079.html
Not really. This is about _displaying_ Unicode, too.
Regards,
Matthias