[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Chinese characters in a string
- From: Marc Balmer <marc@...>
- Date: Tue, 4 Dec 2012 14:33:33 +0100
> Hello, I moved to china some 3 months ago and now that I'm starting to speak the language I'm also starting writing programs that have to operate on strings containing chinese character and i noticed that the string functions behaves in a strange way:
>
> a="我叫李乐"
> print(string.sub(a,1,4)) -->我叫
> print(string.sub(a,1,5)) -->我叫?
> print(string.len(a)) -->8
>
> it seems that every chinese character is counted twice. That was not a problem since all the string functions behaves like this and their results are then compatible but I was interested in the reason behind this results, I guess it is somehow related to the code used to represent them but I'd like to know a precise explanation!
> Thank in advice for the help and sorry for the grammar mistakes i probably made but english is not my motherlanguage!
>
> p.s.
> I hope chinese character works fine on the mailing list or this post will look quite messed up...
You string is encoded in UTF-8, but all Lua string functions assume ASCII (or any 8 bit variant thereof).
To deal with UTF-8 strings, you need special functions.
Maybe it's time to release a utf-8 module that provides such functions...