lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2018-02-23 23:33 GMT+02:00 Sean Conner <sean@conman.org>:
> It was thus said that the Great Dirk Laurie once stated:
>> 2018-02-23 16:46 GMT+02:00 Luiz Henrique de Figueiredo <lhf@tecgraf.puc-rio.br>:
>> >> "u": reads one or more bytes forming one UTF-8 character, and returns
>> >> that character as a string. Returns nil if the file at the current
>> >> position does not start with a valid UTF-8 sequence.
>> >
>> > Can this be done without having to unget more than one byte from the stream?
>>
>> My rough workaround in Lua is this:
>>
>> function readutf8(f)
>>   local pos = f:seek()
>>   local len = math.min(4,f:seek"end"-pos)
>>   f:seek("set",pos)
>>   local buf = f:read(len)
>>   while utf8.len(buf)~=1 and #buf>1 do buf = buf:sub(1,-2) end
>>   if utf8.len(buf)==1 then
>>     f:seek("set",pos+#buf)
>>     return buf
>>   end
>>   f:seek("set",pos)
>> end
>
>   Just a note---you cannot seek on non-files like stdin (that has not been
> redirected) or a network connection.  This function can therefore loose data
> on such files.

"rough workaround"