lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


(tl;dr: I am complaining about the lack of a proper formal spec, and also asking for feedback on spots that I'm attempting to formalize from prose, and help on the last couple open items.)

I'm attempting to write a full parser for Lua for CodeRay. Section 9 of the Reference Manual claims to contain "the complete syntax of Lua in extended BNF", but it has a few holes:

* There is no definition of the Name production. Based on section 3.1 I understand that expressed in regex it would be
/[a-z_]\w*/i

* There is no definition of the formal syntax of a String in the EBNF, though there is in the prose of section 3.1. I suppose that this regex should match, though I've not tested extensively. Any refinements are welcome:
/(?:"(?:[^\\"]|\\[abfnrtvz\\"']|\\\n|\\\d{1,3}|\\x[\da-fA-F]{2})*")|(?:'(?:[^\\']|\\[abfnrtvz\\"']|\\\n|\\\d{1,3}|\\x[\da-fA-F]{2})*')|(?:--\[(=*)\[\.+?\]\1\])/m

* There is no definition of the formal syntax of a Number. Based on experimentation, it looks like this might be a valid regex for matching a Lua number:
/-?\d*\.?\d+(e[-+]?\d+)?/i
Anyone see anything wrong with that?

* Many spots attempt to express literal statements without proper quoting. For example, the "stat" production should probably look like this, with 25 terminal strings denoted:

stat ::=  ‘;’ | 
     varlist ‘=’ explist | 
     functioncall | 
     label | 
     ‘break’ | 
     ‘goto’ Name | 
     ‘do’ block ‘end’ | 
     ‘while’ exp ‘do’ block ‘end’ | 
     ‘repeat’ block ‘until’ exp | 
     ‘if’ exp ‘then’ block {‘elseif’ exp ‘then’ block} [‘else’ block] ‘end’ | 
     ‘for’ Name ‘=’ exp ‘,’ exp [‘,’ exp] ‘do’ block ‘end’ | 
     ‘for’ namelist ‘in’ explist ‘do’ block ‘end’ | 
     ‘function’ funcname funcbody | 
     ‘local function’ Name funcbody | 
     ‘local’ namelist [‘=’ explist] 

* Section 9 does not cover whitespace at all. Section 3.1 simply says,
> "[Lua] ignores spaces (including new lines) and comments between lexical elements (tokens), except as delimiters between names and keywords."

a) I know that "spaces" above is not just ASCII x20, but includes at least \t. Is it all 26 Unicode whitespace characters defined in http://en.wikipedia.org/wiki/Whitespace_character ? If not, what characters are considered whitespace by Lua?

b) It sure would be nice to include in the formal syntax where whitespace is required vs. optional. Yes, it makes it uglier. It also makes it actually useful, as opposed to a rough sketch open to interpretation. 

I would appreciate help as to where whitespace is required versus optional.