lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, Jan 15, 2016 at 7:44 PM, Todd Wegner <twwegner@gmail.com> wrote:
> I would like to understand why the following code produces SYN characters
> (0x16) in Lua53 on Linux.
> The SYN occur whenever split divides a multi-byte character in half.
> Why does string.sub return SYN rather than respective bytes.
>
>
> Code:
>
> local space = string.byte(' ')
> local text = utf8.char(0x92e,0x947,0x930,0x93e, space, 0x928, 0x93e, 0x92e,
> space, 0x932,0x942,0x905, space, 0x939,0x948, 0x964)
>
> local split = 1
> local lh = string.sub(text, 1, split)
> local rh = string.sub(text, split+1)
>
> print('text', text)
> print('lh', lh)
> print('rh', rh)
>
>
> Output:
>
> text    मेरा नाम लूअ है।
> lh    म SYN SYN
> rh    SYN रा नाम लूअ है।
>
> Thanks

string.sub is not UTF-8 aware. It operates on byte strings, not
Unicode character strings.

Look at the utf8 module for Unicode-aware functionality.

/s/ Adam