lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Fri, Feb 3, 2017 at 1:56 AM, Martin <> wrote:
> On 02/01/2017 01:26 AM, Luiz Henrique de Figueiredo wrote:
>> Since you remove comments, you can reuse the Lua lexer.
>> This has the clear benefit that the lexing will be exactly what Lua does.
>> To reuse the Lua lexer, consider using my ltokenp
> [snip]
>> Thanks for sharing lcf.
> Thank you for appreciating it.
> lcf is a quite long experimental project, started at 2013 with idea of
> parser where grammar is presented as lua table, not ad-hoc code,
> not write-only regexp. So there is no lexing stage in this parser,
> tokens are just subgrammars. (Finite nodes are strings to compare with
> input stream or functions which returns true and new stream position
> if successfully ate token.)
> (I've read about lexing stage relatively recently in dragon
> compilers book and frankly speaking see no much need in it in
> general.)
> -- Martin

Performance, abstraction, and reduction in the power of the grammar.
(Keep in mind that powerful grammars are a bad thing because it means
that the parser must also be powerful.)

Yes, in theory, any language can be parsed at the level of characters
being tokens. However, it dramatically limits the tools and techniques
available to you. Constructing your parser at the character level
means that you can't use LL- or LR-type parsers, for example, unless
you put an arbitrary cap on the length of identifiers. Allowing
arbitrary-length identifiers in a character-level grammar means you
MUST use a backtracking-based parser construction; while these are
more powerful, they're also substantially slower.

A lexer can be written as a very compact bit of code. They're simple,
generally elegant, and efficient. No matter what parser construction
you use, a lexer can be run as a forward-only stream. By doing so, you
significantly slash the size of your token stream, and the simpler
grammar you can construct on top of it effectively has an exponent
knocked off of its runtime, and if the resulting grammar can be
processed with finite lookahead you don't even need to use a
recursive-descent parser (like PEG) -- LR parsers run in LINEAR time!

Use a lexer. Your parser will be better.

/s/ Adam