[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: code page
- From: Ignacio Burgueño <ignaciob@...>
- Date: Wed, 13 May 2009 13:52:12 -0300
David Given wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Marco Antonio Abreu wrote:
When a field
value has one accented char, it truncate the last one ('Flávia' comes
like 'Fl??vi' - ?? are especial chars), if the text has two accented
chars it has the last two chars cutted and so on...
This is a classic symptom of UTF-8 misparsing.
Kind of. In fact the problem is that LuaCOM is truncating characters.
The issue is this. There's a function to convert from BSTR (utf-16
strings, as used by COM) to Lua strings.
When converting "Flávia", it computes its size (6) and converts to utf-8
(which gives a 7 byte string: FlÃ¡via) BUT, it pushes just 6 bytes to
Lua (instead of the required 7).
So, the strings got truncated depending on the amount of codepoints
I'll push a fix for that to LuaCOM.