lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Aug 31, 2017 at 10:16:46AM +0800, KHMan wrote:
> On 8/31/2017 2:00 AM, Peter Melnichenko wrote:
> > On Wed, Aug 30, 2017 at 6:07 PM, Benas Vaitkevičius wrote:
> > > Dear Lua community,
> > > 
> > > I am glad to announce parser-gen, a parser generator that I created together
> > > with LabLua this summer. The code can be found on GitHub:
> > > https://github.com/vsbenas/parser-gen
> > > [snip snip snip]
> > > 
> > > This is a beta release, please report any bugs when found.
> > 
> > [snip snip snip]
> > Additionally, the memory consumption of the parser seemed very high on
> > these larger files
> 
> Last time I looked a few years ago, often academic papers on PEG mentions
> memory consumption as an issue. PEG on Wikipedia [1] puts memory usage as
> the first item on Disadvantages. So I guess nothing much has changed.

Memoizing PEG engines (aka packrat parsers) inherently use alot of memory,
and that's what the Wikipedia article is discussing. But LPeg isn't a
packrat parser.

PEG-based parsers do tend to require relatively many matching rules because
the lexing and parsing phases are combined. IIRC, a pure PEG to match an
IPv6 address precisely requires a very large number of matching rules. One
way to decrease memory usage is to split parsing into separate phases, just
as with traditional parsing methods. So, for example, only fuzzily match
IPv6 addresses in your PEG (enough to disambiguate from other syntax, but
not enough to only match a valid IPv6 address), then in a separate pass
check IPv6 address nodes using other methods. A hand-written IPv6 parser in
Lua, using a splitting function and a loop, is just a dozen or so lines of
code.

Also, LPeg in particular makes it incredibly easy to programmatically
generate PEGs. Which means it's easy to generate PEGs with too many,
possibly unnecessary matching rules.