[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Feature request: "u" option to file:read
- From: Sean Conner <sean@...>
- Date: Fri, 23 Feb 2018 16:33:45 -0500
It was thus said that the Great Dirk Laurie once stated:
> 2018-02-23 16:46 GMT+02:00 Luiz Henrique de Figueiredo <lhf@tecgraf.puc-rio.br>:
> >> "u": reads one or more bytes forming one UTF-8 character, and returns
> >> that character as a string. Returns nil if the file at the current
> >> position does not start with a valid UTF-8 sequence.
> >
> > Can this be done without having to unget more than one byte from the stream?
>
> My rough workaround in Lua is this:
>
> function readutf8(f)
> local pos = f:seek()
> local len = math.min(4,f:seek"end"-pos)
> f:seek("set",pos)
> local buf = f:read(len)
> while utf8.len(buf)~=1 and #buf>1 do buf = buf:sub(1,-2) end
> if utf8.len(buf)==1 then
> f:seek("set",pos+#buf)
> return buf
> end
> f:seek("set",pos)
> end
Just a note---you cannot seek on non-files like stdin (that has not been
redirected) or a network connection. This function can therefore loose data
on such files.
-spc