Re: UTF-8 validation

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: UTF-8 validation
From: Jonathan Goble <jcgoble3@...>
Date: Wed, 9 Dec 2015 21:35:25 -0500

On Wed, Dec 9, 2015 at 9:32 PM, Jonathan Goble <jcgoble3@gmail.com> wrote:
> On Wed, Dec 9, 2015 at 9:29 PM, Jay Carlson <nop@nop.com> wrote:
>> Given a string where is_utf8(s) is false, it might be nice to be able to find the byte offset of the first non-UTF-8 sequence.
>
> utf8.len() already does this. On an invalid sequence, it returns two
> values: nil plus the byte position of the first invalid sequence. I
> believe this was also mentioned earlier in the thread.

Clarification: on an invalid sequence, the first return value is "a
false value", [1] thus either nil or false.

[1] http://www.lua.org/manual/5.3/manual.html#pdf-utf8.len

References:
- UTF-8 validation, Cezary H. Noweta
- Re: UTF-8 validation, Coda Highland
- Re: UTF-8 validation, Cezary H. Noweta
- Re: UTF-8 validation, Coda Highland
- Re: UTF-8 validation, Cezary H. Noweta
- Re: UTF-8 validation, Javier Guerra Giraldez
- Re: UTF-8 validation, Coda Highland
- Re: UTF-8 validation, Jay Carlson
- Re: UTF-8 validation, Jonathan Goble

Prev by Date: Re: UTF-8 validation
Next by Date: Re: UTF-8 validation
Previous by thread: Re: UTF-8 validation
Next by thread: Re: UTF-8 validation
Index(es):
- Date
- Thread