2009/5/13 Ignacio Burgueño
<ignaciob@inconcertcc.com>
David Given wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Marco Antonio Abreu wrote:
When a field
value has one accented char, it truncate the last one ('Flávia' comes
like 'Fl??vi' - ?? are especial chars), if the text has two accented
chars it has the last two chars cutted and so on...
This is a classic symptom of UTF-8 misparsing.
Kind of. In fact the problem is that LuaCOM is truncating characters.
The issue is this. There's a function to convert from BSTR (utf-16 strings, as used by COM) to Lua strings.
When converting "Flávia", it computes its size (6) and converts to utf-8 (which gives a 7 byte string: Flávia) BUT, it pushes just 6 bytes to Lua (instead of the required 7).
So, the strings got truncated depending on the amount of codepoints present (roughly).
I'll push a fix for that to LuaCOM.
Regards,
Ignacio Burgueño