lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


In the preliminary manual it states utf8.offset() assumes it is called
on valid UTF8 strings.

In a few places the underlying bytesoffset() function has loops that
iterate infinitely forward or backward provided what it's iterating
over is a continuation byte.  It might be a good idea to limit these.
If it's called only on valid UTF8 it only has to iterate forward or
backward 2 times (from a starting continuation byte) at most.  With
the current code it will do just that -- but if called in invalid UTF8
it might return unpredictable results.

I'm not sure what the most correct behavior would be -- I'm not sure
this needs changing, I just thought I'd mention this grey area ~
pretty sure it falls within 'undefined behavior'.

We're supposed to use utf8.len() to validate the string, yes?

I lost the thread about 'work talk' so I started this one :s