On Tue, Jun 13, 2017 at 1:59 PM, Sen Han <firstname.lastname@example.org> wrote:
> the DSL I want to parse is just a subset of lua language,
> and the goal is to implement a source-to-source translator.
One of my hobby projects has been creating an alternative syntax for
Lua. I use LPeg to transpile from my custom syntax into Lua.
I started by getting LPeg to parse the entire Lua grammar, and then I
started making adjustments to the grammar and syntax. Many
adjustments can be made with substitution captures. I also use table
captures, so that LPeg outputs an abstract syntax tree. I can then
make additional modifications to the tree, and then flatten the tree
to a string of Lua source code. Comments are preserved. Line numbers
are preserved. Source code layout and formatting is preserved as much
as is possible given the inherent differences between my language and
Lua. Many of the changes I am making are cosmetic. My language is
much closer to Lua than is Moonscript, for example.
Your goal (parsing "just a subset of lua") should be simpler than
mine, possibly much simpler.
Problems you may encounter:
1. The Lua grammar is left-recursive. LPeg cannot parse
left-recursive grammars, so I needed to refactor Lua's grammar to
remove the left recursion. If you have never refactored a grammar
before, you might find it challenging.
Aside: There are known techniques to extend PEG parsers so that they
can handle left-recursion. Here is a fork of LPeg that may do
left-recursion for you. I have not tried to use it.
2. My grammar also uses back-references. Back references (in the re
module) are implemented via LPeg back-captures in combination with
LPeg match-time captures that call a Lua function which may create a
unique string. Recently, I approximately tripled the number of
back-references in the grammar, and performance dropped by roughly a
factor of 3. This leads me to suspect that my heavy use of
back-references is the primary bottleneck in my parser. I have been
pondering implementing back-references directly in LPeg to avoid the
overhead of the match-time capture, the call to Lua, and (for failed
matches) the likely creation of a string in Lua. I would, of course,
have to extend the C language LPeg source code to implement
back-references inside LPeg itself.
Aside: If anyone has any tips for profiling LPeg patterns, I would be
happy to hear them. It would be nice to confirm that my suspicions
are correct before investing time and energy modifying the LPeg C
language source code.
3. At one point, I needed to increase LPeg's internal stack limit.
This is trivially done via lpeg.setmaxstack, but I did have to guess
as to what value to use, as the default value was not documented at
that time. (The docs now say the default is 400.)
In general, LPeg has worked very well and has performed as documented
without crashes. Perhaps best of all, working with LPeg has been fun!