lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


For a long time, I didn't used regular expressions (REs), except perhaps in simple cases with Unix tools. Then, I (re-)discovered them in Lua, and some simple examples shown me they are useful and not so hard to use. Then SciTE, my favorite editor, added them and I found myself using them more and more. And I started using them in programming, with PHP, JavaScript, Java, and recently AutoHotkey (which added PCRE support). I even looked at various engines (SciTE's one, simple and easy to follow, Henry Spencer's ones (more complicated, less readable), Gnu's one (Java, bloated!), PCRE's one (very complex), etc.) and I hacked a bit SciTE's engine to add support for \d \s \w \xHH notations (to be submitted).

I know that Lua's RE engine is hand-made by Roberto (IIRC) and intently kept simple and small. That's why it has no alternatives (foo|bar) which is annoying but we can live without that, nor advanced bells and whistles like lookaround assertions.

I recently wrote in AutoHotkey a program to parse a script in this language, to extract a list of functions definitions. It is a simple automaton, using simple REs to match expected syntax in each line. So I thought it would be easy to rewrite it in Lua, so I can use it in SciTE. Alas, it did not worked, as I forgot an important limitation: repetition symbols apply only to character classes, not to sub-patterns! So to take a classical stupid example, I can't write (ab)* to match ababab...

And, more practically, I can't write something like: %s*(%s;.*)?$
Nor: (%)%s*(%{)?)?
The (%})? can be rewritten as (}?) of course, but still I can't write the complete expression.

I can, and will, workaround this, searching after the last match and so on. Yet, this is frustrating, adding complexity to the script.

So, the point of my message is: is there a compelling reason for such limitation? I can understand reasons like lack of time for making better engine, or such feature would have grown the engine too much.

And the second point is: would the Lua team accept a patch in this domain, for the future v.5.2? If that's 'no', I won't even try. Instead of hacking, I would write a wrapper for PCRE, for example. But it wouldn't be usable in SciTE, for example. If that's 'maybe', I can take a look, no promise done, but I believe I have some time before 5.2 is out anyway... I can accept that my patch is rejected if badly written, buggy or too big, but I won't spend time there if rejected by principle.

Well, if patches are accepted, I might attempt as well to implement repetition ranges, like {m,n}. Alternatives are out of question, as it would need to rewrite completely the engine...


BTW, I found myself wishing to have a continue keyword to use in the parsing loop, avoiding excessive 'if' nesting and indenting. I recall having read many times debates on its usefulness, and still can't recall an official reason why it isn't in the language. I am too lazy (read 'no time for that') to search the mailing list archive.

Perhaps somebody will be courageous / bored enough to dig out the reasons / arguments and put them in the Faq on the Wiki. Would be useful too for the global-by-default / local-by-default discussions, and perhaps the ++ -- += -= bitwise-operators... :-)

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --