[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Unicode operation
- From: Budi <budikusasi@...>
- Date: Mon, 20 Jun 2022 16:54:56 +0700
On 6/20/22, Francisco Olarte <folarte@peoplecall.com> wrote:
> On Mon, 20 Jun 2022 at 09:56, Budi <budikusasi@gmail.com> wrote:
>> In learning lua, suddenly met this:
>> > utf8.codepoint("résumé", 1,2)
>> 114 233
>> > utf8.codepoint("résumé", 1,3)
>> 114 233
>> > utf8.codepoint("résumé", 1,4)
>> 114 233 115
>> > utf8.len("résumé")
>> 6
>> Any crystal clear explanation ?
>
> RTFM?
> "utf8.codepoint (s [, i [, j [, lax]]])
> Returns the code points (as integers) from all characters in s that
> start between byte position i and j (both included). The default for i
> is 1 and for j is i. It raises an error if it meets any invalid byte
> sequence."
>
> utf8 treats string like byte arrays. If no one has been playing tricks
> with encodings, pasting from the mail, your string is:
>
> $ echo -n résumé | od -t u1
> 0000000 114 195 169 115 117 109 195 169
> Bytes 1 2 3 4 5 6 7 8
> Chars 1 2 2' 3 4 5 6 6'
>
> From these it should be clear. ( Next can help, as your string is latin1 )
>
> $ echo -n résumé | recode utf8..latin1 | od -t u1
> 0000000 114 233 115 117 109 233
>
> FOS.
Thanks much. but haven't clearly explained code point 233 came out of 195 169