lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Mon, 15 Mar 2010 21:55:24 -0700
Wesley Smith <> wrote:

> Let's say I have a descriptions of the tokens a lexer generates and a
> grammar written in terms of the tokens.  What I'd like to do is write
> some simple LPEG patterns to generate the tokens and then write the
> grammar in terms of those tokens without having to convert the token
> list into a giant string.  Is there any way to set things up such that
> I can effectively pass lpeg.match an iterator over the tokens?
> Here's a mock up:
> -- token_rules is a small grammar defining the tokens
> local lexer = lpeg.P(token_rules)
> -- tokens is a table with the tokens in an array form each token is
> itself a table of the form
> -- { [1] = token_string, type = TOKEN_NAME } where TOKEN_NAME is a
> string name of the token
> local tokens = lexer:match(code)
> -- grammar rules is a grammar written in terms of the possible token
> names (i.e. the TOKEN_NAME values in the type field of each token)
> local grammar = lpeg.P(grammar_rules)
> -- ast is a tree of token tables
> local ast = lpeg.match(grammar, tokens)
> Clearly this last line won't work since tokens is a table not a
> string.  Another idea would be to do:
> function iter(tokens)
>    local i = 1
>    return function()
>       local t = tokens[i]
>       i = i+1
>       if(t) then
>          return t.type
>       end
>    end
> end
> -- ast is a tree of token tables
> local ast = lpeg.match(grammar, iter(tokens))
> where the subject argument of lpeg.match is called until it returns nil
> Is there some way to seduce LPEG into doing something like this as is
> or am I out of luck?
> thanks!
> wes

I don't know how lpeg works in the detail, esp. the form of the generated nodes (esp. whether they are tagged or not). But typically, PEG parsing does not separate scanner and parser levels. Low-level patterns simply yield low-level nodes (say tokens), while higher-level patterns build on top of lower-level ones, yielding nodes representing expressions of higher complexity (say syntax).
I'm not sure recreating a scanner/parser distinction on top of a PEG-based tool is the best approach. Why don't you want to write syntactic patterns in PEG directly? (I'm curious.)

_		: [ \t]*	-- opt spacing
EOL		: [;\n]
_EOL		: _ EOL
ASSIGN		: '='
name		: nameChar+
value		: <whatever>
assignment	: _ name _ASSIGN_ value _EOL


vit e estrany