lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

If we guarantee that "u" always consumes the character, and it just
returns nil if that character happens to be invalid, then it works. If
the current UTF-8 sequence is invalid (and therefore the read returns
nil), then the byte that you thought was going to be a continuation
byte but was in fact not can is the only one that needs pushed back.

If we have to be able to put the read pointer back where it was before
the read in case the character is invalid (e.g. so a different
function could read out the raw bytes) then that couldn't be done with
a single unget.

/s/ Adam

On Fri, Feb 23, 2018 at 8:48 AM, Charles Heywood <> wrote:
> No, because a valid UTF-8 sequence can be invalidated multiple bytes in.
> On Fri, Feb 23, 2018 at 8:46 AM Luiz Henrique de Figueiredo
> <> wrote:
>> > "u": reads one or more bytes forming one UTF-8 character, and returns
>> > that character as a string. Returns nil if the file at the current
>> > position does not start with a valid UTF-8 sequence.
>> Can this be done without having to unget more than one byte from the
>> stream?
> --
> --
> Ryan | Charles <>
> Software Developer / System Administrator