[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Could Lua itself become UTF8-aware?
- From: Jay Carlson <nop@...>
- Date: Mon, 1 May 2017 16:34:18 -0400
> On May 1, 2017, at 9:05 AM, Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
> 
>> I'd like to have Lua better support checking whether something is RFC-legal UTF-8.
> 
> What is wrong with 'utf8.len'?
Nothing I see now. Considering that you fixed it three years ago[1], I am embarrassed; I had written my own and kept it.
The assert-heavy style for is_utf8 is O(n*m). If you know the rules for UTF-8 manipulation in Lua, the number of callpoints, m, can stay small.
I have been saved several times by failed is_utf8 assertions, usually in strings not from my code. OK, my definition of "saved" includes "not producing results outside the domain, possibly causing the next program to crash. maybe." This level of obsession is probably not for everyone.
Jay
[1]: https://github.com/lua/lua/commit/3a044de5a1df82ed5d76f2c5afdf79677c92800f