lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Apr 10, 2014, at 11:25 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:

> The manual says:
> 
> ---
> utf8.offset (s, n [, i])
> 
...
> This function assumes that s is a valid UTF-8 string.
> ---
> 
> Actually, the routine seems always to return something, even if s is not valid.
> The result when n>0 seems to be correct if there are n-1 valid UTF-8 characters.
> 
>> s='voilà'
>> #s
> 6
>> utf8.offset(s,6)
> 7
>> s=s:sub(1,-2).."\xFC"
>> s
> voil�
>> utf8.offset(s,5)
> 5
> 

Which to me hints that perhaps we need utf8.isvalid() as well? Assuming GIGO, then most Lua apps should be validating input before using the other utf8 functions. And i’ve seen rather too many incorrect UTF-8 validators to suggest everyone (anyone?) should roll their own.

—Tim