[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: io:lines() and \0
- From: William Ahern <william@...>
- Date: Mon, 17 Feb 2014 13:05:49 -0800
On Mon, Feb 17, 2014 at 05:16:29PM +0100, Ren? Rebe wrote:
> Hi,
>
> On Feb 17, 2014, at 16:55 , steve donovan wrote:
>
> > On Mon, Feb 17, 2014 at 5:51 PM, Ren? Rebe <rene@exactcode.de> wrote:
> >> I just noticed that io:lines() does not cope with \0 in the lines, and thus
> >> just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).
> >
> > This is not surprising. The whole idea of 'lines' only really applies
> > to text files, at least in my head ;)
>
> well, in my option library foundations should just work, and not silently
> discard some bits and bytes. A line is a line, no matter how many \0 are
> in there until the next \n-newline. And the Lua manual points out Lua
> strings are \0-save.
>
> I already provided patches a year or two ago for other pattern matching \0
> fixes, which where merged into 5.2.
>
> One quite simple and obvious use of lines with \0 binary data is parsing
> MIME, CGI data.
Well, in MIME a line ends in \r\n. So if you want to be 8-bit clean you
technically shouldn't be treating a line as simply ending in \n, anyhow.
OTOH, in MIME even "8-bit" encoded entities shouldn't have bare \0 or \n
characters. The "binary" transfer encoding allows those. But even in binary
transfer encoding a line is \r\n.
So there's no simple answer, really.
The sockets implementation in my cqueues library has a text-mode translation
feature which translates \r\n sequences to \n, because on Unix (unlike
Windows) this is not done by the underlying stdio implementation. This
allows simple (and in practice mostly correct) implementation of MIME-like
protocols. But of course I had to implement all of the buffering myself
because you simply cannot reliably depend on the underlying implementation
if you want dependable behavior.
For example, what's your maximum line length? MIME specifies 998, but in
practice lots of implementations allow much larger limits because of broken
clients (like brain-dead PHP scripts). Lua's internal limit is also probably
too small to be production-quality reliable on the open internet (unless you
want endless support calls), and in any event it's not configurable.
Basically, if you want to be serious about this stuff you have to do your
own buffering.