Re: utf8.codes ignores spurious continuation bytes

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: utf8.codes ignores spurious continuation bytes
From: Christian Ludwig <cl@...>
Date: Sun, 18 Sep 2022 22:30:58 +0200

Hello,

> Can you give an example of an UTF8 byte sequence, where this is
> critical / happens / creates possibly misunderstandings?
> 
> (but the UTF bygtes please also in Hex code).

There is no *valid* UTF-8 byte sequence where this happens. It happens
for invalid UTF-8 byte sequences that have bytes of the form 0x10xxxxxx
in there which are not used as UTF-8 continuity bytes.

Examples:
s = '\x61\xbf\x62'
s = '\x61\x80\x62'

The manual says 
  "It raises an error if it meets any invalid byte sequence."

It does not raise an error for such bytes (yes, there are other invalid
byte sequences where you see an error message, e.g. s = '\x61\xff\x62').
My question:
Is this done on purpose (for conti-bytes) for some reason and the manual
has to be clarified or is it a bug in the code not doing the thing as
mentioned in the manual?

Bye
C. Ludwig

Follow-Ups:
- Re: utf8.codes ignores spurious continuation bytes, bil til

References:
- utf8.codes ignores spurious continuation bytes, Christian Ludwig
- Re: utf8.codes ignores spurious continuation bytes, bil til

Prev by Date: Re: utf8.codes ignores spurious continuation bytes
Next by Date: Smallest Lua program that exercises the whole language
Previous by thread: Re: utf8.codes ignores spurious continuation bytes
Next by thread: Re: utf8.codes ignores spurious continuation bytes
Index(es):
- Date
- Thread