Re: Chinese characters in a string

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Chinese characters in a string
From: Rena <hyperhacker@...>
Date: Tue, 4 Dec 2012 08:15:38 -0500

On Tue, Dec 4, 2012 at 8:10 AM, alessandro codenotti <code95@live.it> wrote:
> Hello, I moved to china some 3 months ago and now that I'm starting to speak
> the language I'm also starting writing programs that have to operate on
> strings containing chinese character and i noticed that the string functions
> behaves in a strange way:
>
> a="我叫李乐"
> print(string.sub(a,1,4)) -->我叫
> print(string.sub(a,1,5)) -->我叫?
> print(string.len(a)) -->8
>
> it seems that every chinese character is counted twice. That was not a
> problem since all the string functions behaves like this and their results
> are then compatible but I was interested in the reason behind this results,
> I guess it is somehow related to the code used to represent them but I'd
> like to know a precise explanation!
> Thank in advice for the help and sorry for the grammar mistakes i probably
> made but english is not my motherlanguage!
>
> p.s.
> I hope chinese character works fine on the mailing list or this post will
> look quite messed up...

Welcome to the "wonderful" world of Unicode. Read about UTF-8 and
understand that Lua string functions operate on bytes, not on
characters, and characters can be more than one byte. (Then read about
combining characters, the numerous different ways to encode the same
glyph, and the different encodings in use by different systems, and
try to cling to your sanity...)

-- 
Sent from my Game Boy.

References:
- Chinese characters in a string, alessandro codenotti

Prev by Date: Re: LuaBitOp: Cant require "bit=>./bit.so: undefined symbol: luaopen_bit
Next by Date: Re: Chinese characters in a string
Previous by thread: Re: Chinese characters in a string
Next by thread: Re: Chinese characters in a string
Index(es):
- Date
- Thread