lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Coroutines once stated:
> On Tue, May 13, 2014 at 6:31 AM, Roberto Ierusalimschy
> <roberto@inf.puc-rio.br> wrote:
> 
> > Utf8.offset does not decode anything. All functions in the library
> > that decode sequences do protect against decoding invalid sequences.
> 
> The manual says we should only be feeding utf8.offset() valid UTF8 --
> so on that premise alone what I'm talking about shouldn't merit any
> changes.  I just thought it might be useful limit those loops so they
> don't iterate beyond what would be considered a valid UTF8 byte
> sequence -- the result can't be trusted to be correct because we're
> passing offset() invalid UTF8, but it can be "less incorrect".  (heh)
> 
> I was thinking some code might depend on offset() returning an index
> within 3 bytes of where it's called from someday.. people might expect
> that because it operates on valid UTF8 it'll return valid offsets per
> that assumption?
> 
> Er, I think I'm beating a dead horse -- I just thought I'd point it
> out for more consideration :>

  If you are curious, check out the source code to joe (Joe's Editor),
specifically, the files i18n.c and utf8.c, to see just the amount of code
required to maybe, hopefully, handle UTF-8.  I have no idea how well it
deals with right-to-left languages.  

  -spc