[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: UTF-8 validation
- From: Jonathan Goble <jcgoble3@...>
- Date: Wed, 9 Dec 2015 21:35:25 -0500
On Wed, Dec 9, 2015 at 9:32 PM, Jonathan Goble <jcgoble3@gmail.com> wrote:
> On Wed, Dec 9, 2015 at 9:29 PM, Jay Carlson <nop@nop.com> wrote:
>> Given a string where is_utf8(s) is false, it might be nice to be able to find the byte offset of the first non-UTF-8 sequence.
>
> utf8.len() already does this. On an invalid sequence, it returns two
> values: nil plus the byte position of the first invalid sequence. I
> believe this was also mentioned earlier in the thread.
Clarification: on an invalid sequence, the first return value is "a
false value", [1] thus either nil or false.
[1] http://www.lua.org/manual/5.3/manual.html#pdf-utf8.len