lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, May 13, 2014 at 6:31 AM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:

> Utf8.offset does not decode anything. All functions in the library
> that decode sequences do protect against decoding invalid sequences.

The manual says we should only be feeding utf8.offset() valid UTF8 --
so on that premise alone what I'm talking about shouldn't merit any
changes.  I just thought it might be useful limit those loops so they
don't iterate beyond what would be considered a valid UTF8 byte
sequence -- the result can't be trusted to be correct because we're
passing offset() invalid UTF8, but it can be "less incorrect".  (heh)

I was thinking some code might depend on offset() returning an index
within 3 bytes of where it's called from someday.. people might expect
that because it operates on valid UTF8 it'll return valid offsets per
that assumption?

Er, I think I'm beating a dead horse -- I just thought I'd point it
out for more consideration :>