lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


David Given wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Marco Antonio Abreu wrote:
When a field
value has one accented char, it truncate the last one ('Flávia' comes
like 'Fl??vi' - ?? are especial chars), if the text has two accented
chars it has the last two chars cutted and so on...

This is a classic symptom of UTF-8 misparsing.

Kind of. In fact the problem is that LuaCOM is truncating characters.

The issue is this. There's a function to convert from BSTR (utf-16 strings, as used by COM) to Lua strings. When converting "Flávia", it computes its size (6) and converts to utf-8 (which gives a 7 byte string: Flávia) BUT, it pushes just 6 bytes to Lua (instead of the required 7).

So, the strings got truncated depending on the amount of codepoints present (roughly).

I'll push a fix for that to LuaCOM.

Regards,
Ignacio Burgueño