[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Unicode operation
- From: Francisco Olarte <folarte@...>
- Date: Mon, 20 Jun 2022 10:17:46 +0200
On Mon, 20 Jun 2022 at 09:56, Budi <budikusasi@gmail.com> wrote:
> In learning lua, suddenly met this:
> > utf8.codepoint("résumé", 1,2)
> 114 233
> > utf8.codepoint("résumé", 1,3)
> 114 233
> > utf8.codepoint("résumé", 1,4)
> 114 233 115
> > utf8.len("résumé")
> 6
> Any crystal clear explanation ?
RTFM?
"utf8.codepoint (s [, i [, j [, lax]]])
Returns the code points (as integers) from all characters in s that
start between byte position i and j (both included). The default for i
is 1 and for j is i. It raises an error if it meets any invalid byte
sequence."
utf8 treats string like byte arrays. If no one has been playing tricks
with encodings, pasting from the mail, your string is:
$ echo -n résumé | od -t u1
0000000 114 195 169 115 117 109 195 169
Bytes 1 2 3 4 5 6 7 8
Chars 1 2 2' 3 4 5 6 6'
>From these it should be clear. ( Next can help, as your string is latin1 )
$ echo -n résumé | recode utf8..latin1 | od -t u1
0000000 114 233 115 117 109 233
FOS.