Re: Unicode operation

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Unicode operation
From: Francisco Olarte <folarte@...>
Date: Mon, 20 Jun 2022 10:17:46 +0200

On Mon, 20 Jun 2022 at 09:56, Budi <budikusasi@gmail.com> wrote:
> In learning lua, suddenly met this:
> > utf8.codepoint("résumé", 1,2)
> 114     233
> > utf8.codepoint("résumé", 1,3)
> 114     233
> > utf8.codepoint("résumé", 1,4)
> 114     233     115
> > utf8.len("résumé")
> 6
> Any crystal clear explanation ?

RTFM?
"utf8.codepoint (s [, i [, j [, lax]]])
Returns the code points (as integers) from all characters in s that
start between byte position i and j (both included). The default for i
is 1 and for j is i. It raises an error if it meets any invalid byte
sequence."

utf8 treats string like byte arrays. If no one has been playing tricks
with encodings, pasting from the mail, your string is:

$ echo -n résumé | od -t u1
0000000 114 195 169 115 117 109 195 169
Bytes      1     2     3     4     5     6     7      8
Chars     1     2     2'     3     4     5     6      6'

>From these it should be clear. ( Next can help, as your string is latin1 )

$ echo -n résumé | recode utf8..latin1 | od -t u1
0000000 114 233 115 117 109 233

FOS.

Follow-Ups:
- Re: Unicode operation, Budi

References:
- Unicode operation, Budi

Prev by Date: Unicode operation
Next by Date: Re: Unicode operation
Previous by thread: Unicode operation
Next by thread: Re: Unicode operation
Index(es):
- Date
- Thread