[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: UTF-8 patterns in Lua 5.3
- From: Roberto Ierusalimschy <roberto@...>
- Date: Wed, 16 Apr 2014 10:49:30 -0300
> * the current code works for UTF-8 characters up to 4-bytes long; IIRC
> UTF-8 sequences are up to 6-bytes long; this is easily expandable
> changing the variables used to store characters to 64-bit integers, or
> rewriting the comparison code in a few places.
UTF-8 was originally designed for up to six bytes (to encode any 31-bit
number), but the current definition limits byte sequences to four
bytes. (Actually, it limits the maximum value to the Unicode limit of
0x10FFFF.) The new standard library in Lua follows these limits.
(Sorry for not commenting the real stuff of your message. I need
some time to have a look at that.)
-- Roberto