lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Thanks, this is interesting info.

I would have thought that single bytes in the range 0x80... 0xBF as in
your two sequences above would be allowed and then correspond to the
standard unicode value. (in the range 0xA0...0xBF this contains really
"nice chars" like µ ² ±.

In the new version for Wiki UTF8 (english) it is really stated
clearly, that ASCII bytes 0...0x7F must NOT be followed by a
continuation character 0x80... 0xBF... . (I think this is quite new
Wiki article, some months ago I did not recognize this at least when I
looked at this UTF8 description there in more detail).

I just would be a bit anxious that many UTF8 encodings "running around
in the web" would somehow ignore this rule, and just use such chars
0xA0...0xBF also as "single chars" for their Unicode equivalents (like
µ, ...).

Am So., 18. Sept. 2022 um 22:31 Uhr schrieb Christian Ludwig <cl@exomail.to>:
> Examples:
> s = '\x61\xbf\x62'
> s = '\x61\x80\x62'