Re: io:lines() and \0

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: io:lines() and \0
From: Sean Conner <sean@...>
Date: Fri, 21 Feb 2014 04:00:45 -0500

It was thus said that the Great René Rebe once stated:
> 
> On Feb 20, 2014, at 23:03 , Sean Conner wrote:
> 
> > It was thus said that the Great René Rebe once stated:
> >> 
> >> The next time you parse a text file which accidental has a \0 somewhere
> >> you probably want this bug fix, too ;-) Especially after you spend hours
> >> to figure out what is going on, …
> > 
> >  You would have had the same trouble with C if you used fgets().
> 
> I have no trouble using fgets, but then if I would not use Lua I would actually use C++, ...
> 
> >  And could you tell me what tool created a text file with embedded NULs in
> > it?  I want to avoid using said tool ...
> 
> As mentioned I hit this while implementing a CGI upload, so parsing MIME data.

  Well, MIME is also used in email, which by definition, is *not* 8-bit
clean, which is why MIME was created in the first place, to stuff binary
data into a 7-bit ASCII data stream and NUL bytes are most assuredly not
allowed.  That's why I asked.

  I too, wrote code to parse CGI data and even used it for a personal
project to upload pictures on my iPhone via a webpage.  I never had an issue
with reading a NUL byte with that, so I decided to check the implementation
to see how I avoided problems with NUL bytes, because I certainly don't
remember there being any issues in the first place.

  Well, I apparently sidestepped the issue entirely:

local function multipart(separator,data)
  local boundary = lpeg.P("--" .. separator)
  local hdrs     = core.parse_headers(mime._HEADERS,contentdisp._HEADERS)
  local body     = lpeg.C((lpeg.P(1) - boundary)^0)
  local section  = boundary 
                 * core.CRLF 
                 * lpeg.Ct(lpeg.Cg(hdrs,"headers") * lpeg.Cg(body,"body"))
  local sections = lpeg.Ct(section^1) * boundary * lpeg.P"--" * core.CRLF

  local tmp = sections:match(data)
  ...
end

by using LPeg.  

  But, LPeg aside, there are other ways of reading in the data.  One: if
Content-Length: exists, convert the value to an integer, and pass that to
f:read(), which will read that many bytes of data (using the C function
fread()).  If the Content-Length: header doesn't exist, and the
Content-Transfer-Encoding: header indicates 8bit, then yes, you have an
issue, and one that can be solved without patching Lua, by writing a C
module to Do The Right Thing.  Because even *if* the Lua team accept your
proposal, at best, it'll be placed on the Lua bugs page and won't become a
part of Lua until the next official release X years from now (that might be
fine if you are not planning on releasing your code and have a locally
patched Lua; anyone else that wants to use your code will need to have a
patched Lua).  And don't forget about LuaJIT (different team).  It probably
suffers from the same issue.

  If at all possible, if you can find a copy of _The Standard C Library_ by
P.J. Plauger, read chapter 12.  It talks about the history of <stdio.h> and
the issues that went into the standards process (started in 1983!) about
handling I/O in C and why text was so problematic back then (for different
reasons than now, and not all related to using different end of line
markers) and the compromises made by the ANSI committee.

  -spc

Follow-Ups:
- Re: io:lines() and \0, René Rebe

References:
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, Roberto Ierusalimschy
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Cezary H. Noweta
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Dirk Laurie
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, René Rebe

Prev by Date: Re: io:lines() and \0
Next by Date: Re: io:lines() and \0
Previous by thread: Re: io:lines() and \0
Next by thread: Re: io:lines() and \0
Index(es):
- Date
- Thread