[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: file:read with a maximum line length
- From: Sean Conner <sean@...>
- Date: Sat, 4 May 2013 19:45:26 -0400
It was thus said that the Great Rena once stated:
> On Sat, May 4, 2013 at 6:01 PM, Petite Abeille <petite.abeille@gmail.com>wrote:
> 
> >
> > On May 4, 2013, at 11:55 PM, Coda Highland <chighland@gmail.com> wrote:
> >
> > > The problem with that, little bee, is that it stops reading on EOF,
> > > not EOL, which was the request.
> >
> > Not a problem. Rather an opportunity to read a file in controlled chunks
> > length, which is the crux of the issue, and simply recompose the lines by
> > oneself. So… instead of complicating the read() API with obscure notations,
> > just turn the problem around.
> >
> >
> >
> That's true. I was just thinking '*l' without a length limit is vulnerable
> to being given a potentially endless line, but I guess that's more a reason
> to not use '*l' in cases where the input can't be trusted than to change
> how it works.
  You could always replace the read function with your own.  Here's an
untested approach that should work:
do
  local MAXSIZ    = 8192 -- max size we accept, adjust to taste
  local BUFSIZ    = 1024 -- increments we read in
  -- -----------------------------------------------------------------------
  -- lineparse will return two strings, the first being a full line (or nil
  -- if there isn't one) and the second being the rest of the input.
  -- -----------------------------------------------------------------------
  local lpeg      = require "lpeg"
  local eoln      = lpeg.P"\r"^-1 * lpeg.P"\n"
  local lineparse = lpeg.C((lpeg.P(1) - eoln)^0) 
                  * eoln 
                  * lpeg.C(lpeg.P(1)^0)
  function my_read(fp)
    local old_read = fp.read -- save original function
    local buffer = ""
    local function r(fp,amount) -- our new read function
      local amount = amount or "*l"
      local data
      -- -------------------------------------------------------------------
      -- if given a number of bytes to read, return whatever we have in the
      -- buffer, plus read more if need be.  
      -- -------------------------------------------------------------------
      if type(amount) == 'number' then
        if #buffer > amount then
          data   = buffer:sub(1,amount)
          buffer = buffer:sub(amount+1,-1)
          return data
        else
          local cnt  = amount - #buffer
          data       = buffer .. old_read(fp,cnt)
          buffer     = ""
          return data
        end
      
      -- ------------------------------------------------------------------
      -- read the next line.  This attempts to read in the next line, but if
      -- the buffer exceeds some maximum size, just return what's there as
      -- we have too much data already and don't want to potentially run out
      -- of memory.  If the buffer doesn't have enough data (and isn't the
      -- maximum size), read some more (in BUFSIZ increments) and try again.
      -- ------------------------------------------------------------------
      elseif amount == "*l" then
        local line,rest = lineparse:match(buffer)
        if line == nil then
          if #buffer > MAXSIZ then
            data   = buffer
            buffer = ""
            return data
          end
          buffer = buffer .. old_read(fp,BUFSIZ)
          return r(fp,amount)
        end
      
      -- -------------------------------------------------------------
      -- return current contents of buffer, plus the rest of the file.
      -- -------------------------------------------------------------
      elseif amount == "*a" then
        data = buffer .. old_read(fp,amount)
        buffer = ""
        return data
      end
    end
    return r
  end
end  
f = io.open("somefile","r")
f.read = my_read(f)
  -spc (The reason I post this is that I have something similar to this code
	for my own use ... )