Re: io:lines() and \0

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: io:lines() and \0
From: Philipp Janda <siffiejoe@...>
Date: Fri, 21 Feb 2014 12:16:59 +0100

Am 21.02.2014 09:06 schröbte Tom N Harris:

On Friday, February 21, 2014 12:29:37 AM Philipp Janda wrote:

I may be wrong, but isn't there some 16-bit encoding where every other
byte is zero for ASCII characters (UCS-2, UTF-16, or something)?


That may be the case, but trying to read such an encoding will get you in
trouble because the CR+NL is represent in 16-bit characters also. So a line
would be terminated with the bytes (in little-endian mode) 0D 00 0A 00. fgets
will stop at the 0A which is a malformed character, then the next read will
start with 00 and the rest of your text is garbled.

In no text encoding (that I know of) where an end-of-line is just 0A, and thus
can be read by fgets, does a valid string contain 00. Anything else must be
treated as not-text even if it is an encoding of text. Otherwise you'd break
the encoding like shown above.

I agree. It was only meant as an example of a "text file" containing NULbytes where the concept of lines may still be relevant. (Although, ifyou are prepared to handle some trailing/leading NUL bytes each line,splitting at '\n' bytes should still "work" ...)


The "fix", as was mentioned some days ago, is to add a note to the manual that
the line reading functions don't work if the line to be read may contain non-
text characters such as NULL, CR, or Ctrl+Z. In other words:

     A man said to the doctor, "It hurts when I move my arm like this."
     Said the doctor, "Then don't do that."

I'm not sure whether that joke is supposed to prove your point or mine,but whatever ... What if somebody else moves your arm? You often don'thave control over the files your programs open (and the people who dohave control might not read the Lua reference manual). Say for example,you process a text file (letters and whitespace only) but somehow asingle NUL character is in it (via cosmic rays, or the unfortunatecombination of keyboard shortcuts and big fingers). If `file:lines()`returns all data including the NUL I can throw a parse error (willprobably happen automatically if I use pattern matching or LPeg on thelines). With the current approach I can only detect that case if themissing data makes my line malformed or if I scan the file using someother method.

Unless we can all agree that `file:lines` is for text files in toyprograms only, where detecting invalid input is not that important.

Btw., there was a related security hole[1] with certificate requestswhere the Common Name (a data+length string in the spec) contains a NULbyte and is compared via C functions (stopping at the first NUL).



Philipp


  [1]: http://youtu.be/ibF36Yyeehw#t=23m24s  (a very nice talk, btw!)

References:
- io:lines() and \0, René Rebe
- Re: io:lines() and \0, Dirk Laurie
- Re: io:lines() and \0, Philipp Janda
- Re: io:lines() and \0, Tom N Harris

Prev by Date: Re: io:lines() and \0
Next by Date: Re: io:lines() and \0
Previous by thread: Re: io:lines() and \0
Next by thread: Re: io:lines() and \0
Index(es):
- Date
- Thread