lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Geoff Leyland once stated:
> Hi,
> 
> What’s the current best option for a CSV (or tab separated, for that
> matter) file?
> 
> I’ve had a look at http://lua-users.org/wiki/CsvUtils and
> http://lua-users.org/wiki/LuaCsv, searched LuaRocks (nothing came up, but
> perhaps I’m using the wrong search term) and looked at Penlight’s
> data.read.  As far as I can tell, most solutions either:
>  - read the whole file in one go (constructing a table of all the values
>    becomes impractical as files get larger)
>  - read lines with “*l” and so are opinionated about what constitutes a
>    newline
>  - don’t handle embedded newlines in quoted fields
> 
> There’s also an LPeg example, but as I understand it, LPeg works on whole
> strings, not file streams?

  Yes, but you can read a line at a time and use LPeg to break the line
down.  You mentioned that there are issues with what constitutes a newline,
but there are ways around that.  One method I use is:

-- Oh, let's just use the MIT license here.  
--
-- MIT LICENSE HERE

local lpeg = require "lpeg"

-- End of Line Marker.  This matches an optional CR with a mandatory LF.
-- If your system uses different end of line markers, change this.

local eoln = lpeg.P"\r"^-1 * lpeg.P"\n"

-- Parse data.  This will return a "line" (per definition of eoln) and
-- additional data.

local lineparse = lpeg.C((lpeg.P(1) - eoln)^0) * eoln * lpeg.C(lpeg.P(1)^0)

do
  -- start with an empty buffer.  

  local data = ""

  function read_line(file)

    -- -----------------------------------------------------------------
    -- data being nil means we've hit the end of file, so we return nil.
    -- -----------------------------------------------------------------

    if data == nil then
      return nil
    end

    -- ------------------------------------------------------------------
    -- attempt to read a line (per the eoln definition) and any remaining
    -- data.
    -- ------------------------------------------------------------------

    local line,rest = lineparse:match(data)

    -- ---------------------------------------------------------------------
    -- if line is nil, there wasn't a line's worth of data.  so we need some
    -- more, in this case, 1024 byte worth (adjust to taste).  If we receive
    -- nil from our stream, we've hit end of file, so we'll just return what
    -- we have buffered, and mark that we've hit end of stream.
    -- ---------------------------------------------------------------------

    if line == nil then
      local more = file:read(1024)
      if more == nil then
        local d = data
        data    = nil
	return d
      end

      -- ------------------------------------------------------------------
      -- we've read the data.  Append it to our buffer, then call ourselves
      -- again (tall call).
      -- ------------------------------------------------------------------
      
      data = data .. more
      return read_line(file)
    end
    
    -- ---------------------------------------------------------------------
    -- The rest of the data goes into the buffer; we then return the line we
    -- just read.
    -- ---------------------------------------------------------------------

    data = rest
    return line
  end
end

  Now, with that out of the way, you can read the file line-by-line and have
LPeg parse the line for you.

  -spc