lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 10 December 2015 at 04:22, Cezary H. Noweta <chn@poczta.onet.pl> wrote:
>
> On 2015-12-10 04:49, Cezary H. Noweta wrote:
>
>>>> On Wed, Dec 9, 2015 at 9:29 PM, Jay Carlson <nop@nop.com> wrote:
>>>>>
>>>>> Given a string where is_utf8(s) is false, it might be nice to be
>>>>> able to find the byte offset of the first non-UTF-8 sequence.
>
>
>> For Jay's idea: (1) let utf8.validate() return:
>>
>> str false --> if there was not ill-formed (number 0 is not false)
>>
>> str number --> number of first invalid byte (in src str)
>>             --> if there was ill-formed
>>
>> and/or (2) third parameter (flags in one integer parameter?) stoponerror
>> - if somebody want to write his own make_safe, then it is good idea to
>> have the first well-formed part of string instead of a whole
>> validificated string.
>
>
> OK - now the function returns:
>
> str false --> if every thing is ok
>
> str number --> if there was an error;
>                number is position in the source string
>                of invalid character

Minor detail, but using "false" to indicate no-error is something I've
never seen in Lua APIs. (It's common in C to mean 0 = OK, but `false`
as OK in Lua strikes me as strange). I think nil would be more
idiomatic there, to indicate "absence" of the error position. And you
could still do:

local s, errpos = utf8.validate("bla")
if errpos then
   -- got an error position
end

> http://lua.chncc.eu/utf8/201512100653/lutf8lib.c

It would be nice if this function was implemented as part of a
standalone module that could be deployed separately, instead of
injecting into the standard `utf8` table. (It would be nicer for
distributing with LuaRocks, too). utf8check, maybe?

Also, "sanitize" sounds like a good name for the function itself.

-- Hisham