lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hello Lua-Community,

I have the following question:

Lua 5.4.4  Copyright (C) 1994-2022 Lua.org, PUC-Rio
> for pos, cp in utf8.codes('in\xbfvalid') do print(pos, cp) end
1	105
2	110
4	118
5	97
6	108
7	105
8	100

Any spurious/fake conti-bytes are ignored in utf8.codes.
https://www.lua.org/manual/5.4/manual.html#pdf-utf8.codes
says: "It raises an error if it meets any invalid byte sequence."

But in the source
 https://www.lua.org/source/5.4/lutf8lib.c.html
it seems to me this is done on purpose; in iter_aux 

 if (n < len) {
    while (iscont(s + n)) n++;  /* skip continuation bytes */
  }

Is this done on prupose? Is it supposed to act like this?
If this is done on purpose, then I misread the manual. Sorry.
If it's not on purpose, then iter_aux has to be changed, e.g. the 3
lines above deleted and the "next" result of utf8_decode has to be used
to update "n" (instead of n+1) a few lines below.

Bye
C. Ludwig