lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 6/20/22, Francisco Olarte <folarte@peoplecall.com> wrote:
> On Mon, 20 Jun 2022 at 09:56, Budi <budikusasi@gmail.com> wrote:
>> In learning lua, suddenly met this:
>> > utf8.codepoint("résumé", 1,2)
>> 114     233
>> > utf8.codepoint("résumé", 1,3)
>> 114     233
>> > utf8.codepoint("résumé", 1,4)
>> 114     233     115
>> > utf8.len("résumé")
>> 6
>> Any crystal clear explanation ?
>
> RTFM?
> "utf8.codepoint (s [, i [, j [, lax]]])
> Returns the code points (as integers) from all characters in s that
> start between byte position i and j (both included). The default for i
> is 1 and for j is i. It raises an error if it meets any invalid byte
> sequence."
>
> utf8 treats string like byte arrays. If no one has been playing tricks
> with encodings, pasting from the mail, your string is:
>
> $ echo -n résumé | od -t u1
> 0000000 114 195 169 115 117 109 195 169
> Bytes      1     2     3     4     5     6     7      8
> Chars     1     2     2'     3     4     5     6      6'
>
> From these it should be clear. ( Next can help, as your string is latin1 )
>
> $ echo -n résumé | recode utf8..latin1 | od -t u1
> 0000000 114 233 115 117 109 233
>
> FOS.
Thanks much. but haven't clearly explained code point 233 came out of 195 169