lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2018-02-23 18:00 GMT+02:00 Dirk Laurie <dirk.laurie@gmail.com>:
> 2018-02-23 16:46 GMT+02:00 Luiz Henrique de Figueiredo <lhf@tecgraf.puc-rio.br>:
>>> "u": reads one or more bytes forming one UTF-8 character, and returns
>>> that character as a string. Returns nil if the file at the current
>>> position does not start with a valid UTF-8 sequence.
>>
>> Can this be done without having to unget more than one byte from the stream?
>
> My rough workaround in Lua is this:
>
> function readutf8(f)
>   local pos = f:seek()
>   local len = math.min(4,f:seek"end"-pos)
>   f:seek("set",pos)
>   local buf = f:read(len)
>   while utf8.len(buf)~=1 and #buf>1 do buf = buf:sub(1,-2) end
>   if utf8.len(buf)==1 then
>     f:seek("set",pos+#buf)
>     return buf
>   end
>   f:seek("set",pos)
> end
>
> This has only one unget, at the expense of always reading four bytes
> except when the file is shorter.

I have just read "man getchar" and saw that one can't unget the result
of "fgets".
You have to do it one byte at time and only the first byte is guaranteed.

Sorry for the noise.