[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: parsing improvement
- From: Sean Conner <sean@...>
- Date: Fri, 29 May 2015 16:39:30 -0400
It was thus said that the Great Lionel Duboeuf once stated:
> hello you all,
>
> Just in case i'm doing it not efficiently and to learn best practices:
> I have a character stream that is formated like this one:
>
> ...<6 orange/> <2 20/> <1 1/> <2 20/> <5 false/> <1 0/> <16 orange
> mechanics/> <2 25/>...
[ snip ]
> i did some benchmarks, and found using gmatch and iterating trough
> captures more efficient, but it is not usable when we need to specify a
> starting offset position (like string.find) and i don't want to split my
> string to avoid copies.
>
> any advices will be very appreciated.
What you want to do is possible with LPeg.
local lpeg = require "lpeg"
-- To make the following code a bit easier on the eyes
local Ct = lpeg.Ct -- collect captures in a table
local Cg = lpeg.Cg -- assign capture to a field in a table (how I'm using this)
local Cp = lpeg.Cp -- capture the current position
local C = lpeg.C -- a generic capture of text
local S = lpeg.S -- match a set of characters
local P = lpeg.P -- match a literal string (see LPeg documentation for more on this)
local token = (P(1) - P"/")^1 -- match everything up to a '/'
local bracket = S" \t\n"^0 -- skip 0 or more space characters
* P"<" -- skip start marker
* lpeg.locale().digit^1 -- skip digits; looks like you ignore them
* S" \t\n"^1 -- skip one or more space characters
* C(token) -- capture the text we want
* P"/>" -- skip the end marker
local pair = Ct( -- collect captures in a table
Cg(bracket,"col1") -- assign this to col1
* Cg(bracket,"col2") -- assign this to col2
)
* Cp() -- include the current position
local test = [[<6 orange/> <2 20/> <1 1/> <2 20/> <5 false/> <1 0/> <16 orange
mechanics/> <2 25/>]]
local pos = 1
while true do
local x,npos = pair:match(test,pos)
if not x then break end
print(x.col1,x.col2)
pos = npos
end
-spc (From there, it should be easy to wrap it up into an iterator ... )