lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, May 01, 2009 at 10:44:21AM -0600, Joshua Jensen wrote:
> I started with the example in the LPeg manual for CSV parsing and tried 
> adding support to have it parse an entire buffer of multiple CSV lines:
[...]
> local record = lpeg.Ct(field * (',' * field)^0) * (lpeg.P'\r\n' + 
> lpeg.P'\n' + -1)
> local all = lpeg.Ct(record^0)
>
> This fails, of course, with a "loop body may accept empty string" error.
>
> What is the proper solution for making that error go away?

Hi,

Well, you basically just need to rewrite the expression so you
don't have any empty strings inside loops. :-) I hope the
following examples will help:

require 'lpeg'

local P  = lpeg.P
local C  = lpeg.C
local Cs = lpeg.Cs
local Ct = lpeg.Ct

local a_qt   = '"'
local a_fsep = ','
local a_rsep = '\n'

local qt   = P( a_qt )
local fsep = P( a_fsep )
local rsep = P( a_rsep ) + P'\r\n'

local unquoted_field = C( (1-(qt + fsep + rsep)) ^0 )
local quoted_field   = qt * Cs( ((P(1)-qt) + (qt*qt)/a_qt) ^0) * qt
local field          = quoted_field + unquoted_field
local record         = Ct( field  * (fsep*field)^0  )
local records        = Ct( record * (rsep*record)^0 )
local all            = records * (rsep + -1)

local txt = [[
1,2,3
a,b,c
"1,a","2""b",""""
]]

local res = lpeg.match( all, txt )

for i, t in ipairs( res ) do
    print( '@@ '..i, t )
    table.foreach( t, print )
end


Note that this might not work quite as expected. If empty
records are allowed, and the string ends with a record
separator, and the last record separator is optional, we don't
know if there should be an intentional empty record between the
last record separator and the end of string, or if the last
record separator just terminates the previous record. So an
alternative is:

local records        = Ct( record * (rsep*#P(1)*record)^0 )


Also note that a syntax error in the string is not detected. The
part after the error is just silently not matched by the lpeg
expression.


A way that more clearly express the other way of thinking is:

local nonemptyrecord = #P(1-rsep) * record
local records        = Ct( (record*rsep)^0 * nonemptyrecord^-1 )
local all            = records * -1


This last version will return nil on errors, since 'all' must
match the end of string.



-- 
Tommy Pettersson <ptp@lysator.liu.se>