lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On May 29, 2013, at 9:32 AM, Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
>> * There is no definition of the formal syntax of a String in the EBNF,
>> though there is in the prose of section 3.1. I suppose that this regex
>> should match, though I've not tested extensively. Any refinements are
>> welcome:
>> /(?:"(?:[^\\"]|\\[abfnrtvz\\"']|\\\n|\\\d{1,3}|\\x[\da-fA-F]{2})*")|(?:'(?:[^\\']|\\[abfnrtvz\\"']|\\\n|\\\d{1,3}|\\x[\da-fA-F]{2})*')|(?:--\[(=*)\[\.+?\]\1\])/m
> 
> I am afraid I cannot read this very well. But I think the '[^\\"]'
> in the beginning should be at least '[^\\"\n\r]' (or classes, by default,
> excludes these characters?).

Ah, right you are, thank you.

> Also, the last part should not include '--' in the beginning.

Oops! Copy/paste from the long comment. Thanks.

>> * There is no definition of the formal syntax of a Number. Based on experimentation, it looks like this might be a valid regex for matching a Lua number:
>> /-?\d*\.?\d+(e[-+]?\d+)?/i
>> Anyone see anything wrong with that?
> 
> - A '-' is not part of a number (for the lexer). Otherwise, x-3 would
> be read as 'x' followed by '-3'.
> 
> - The definition of a numeral in Lua follows C (which, in retrospect,
> may not have been a very good idea). So, things like "3." are correct,
> too.

Interesting and helpful.

> - Lua accepts hexadecimal numerals. (In 5.2, that includes floating-point
> hexas too.)

Ah, yes, I had those covered in a separate section.


FWIW, in the end I abandoned my recursive descent parser due to the presence of both direct and indirect left recursion in the grammar, and the amount of rewriting of the grammar I would have needed to do. (The more I rewrite the grammar, the less chance there is of it being correct.) In the end I ended up with one hella big regex for simple-but-effective syntax highlighting :)

https://github.com/Phrogz/coderay/blob/master/lib/coderay/scanners/lua.rb#L31