lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Purely as an exercise I have been writing a small Lua 5.2
script using lpeg 0.10 to remove comments from scripts. This
has started me thinking about aspects of syntax, and I suspect
that on the list there may be those who can point me toward
useful material.

It is fairly obvious, for example, that if a language permits
single-line comments that do not have to start at the beginning
of a line, and strings of some sort, then it is impossible to
recognize comments without also recognizing strings. Consider a
line:

 x = "--" -- finally we have a comment

for example. Lua comments can be recognized in fact by a subgrammar whose
nonterminals correspond to strings and comments.

For much the same reason, you cannot recognize chunks in Lua source code
without parsing expressions; because expressions may contain anonymous
functions, and function bodies are chunks. 

I would never argue that ease of parsing should ever take precedence
over ease of writing code. Nevertheless, maybe it is interesting to
ask whether parsing can be simplified, or decomposed into simpler
parses, with only minor changes of syntax. In the syntax of Lua 5.2
it is very nearly the case that all chunks are initiated by 'do' and
terminated by 'end'. Changing 'then' to 'do' would allow conditional
statements to have the form 
   'if' exp chunk else chunk
The repeat statement could have the form
    'repeat' chunk 'until' exp
with only a mild amount of redundancy. 
Requiring all statements to be terminated by a semicolon would also
simplify parsing. 

Can anybody point me to material that makes precise the idea of
decomposing grammars?

-- 
Gavin Wraith (gavin@wra1th.plus.com)
Home page: http://www.wra1th.plus.com/