Re: Chinese characters in a string

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Chinese characters in a string
From: Marc Balmer <marc@...>
Date: Tue, 4 Dec 2012 14:33:33 +0100

> Hello, I moved to china some 3 months ago and now that I'm starting to speak the language I'm also starting writing programs that have to operate on strings containing chinese character and i noticed that the string functions behaves in a strange way:
> 
> a="我叫李乐"
> print(string.sub(a,1,4)) -->我叫
> print(string.sub(a,1,5)) -->我叫?
> print(string.len(a)) -->8
> 
> it seems that every chinese character is counted twice. That was not a problem since all the string functions behaves like this and their results are then compatible but I was interested in the reason behind this results, I guess it is somehow related to the code used to represent them but I'd like to know a precise explanation!
> Thank in advice for the help and sorry for the grammar mistakes i probably made but english is not my motherlanguage!
> 
> p.s.
> I hope chinese character works fine on the mailing list or this post will look quite messed up...


You string is encoded in UTF-8, but all Lua string functions assume ASCII (or any 8 bit variant thereof).

To deal with UTF-8 strings, you need special functions.

Maybe it's time to release a utf-8 module that provides such functions...

Follow-Ups:
- Re: Chinese characters in a string, Erik Hougaard

References:
- Chinese characters in a string, alessandro codenotti

Prev by Date: Re: Chinese characters in a string
Next by Date: Re: Chinese characters in a string
Previous by thread: Re: Chinese characters in a string
Next by thread: Re: Chinese characters in a string
Index(es):
- Date
- Thread