[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Argh! Regular expressions in Lua
- From: Philippe Lhoste <PhiLho@...>
- Date: Wed, 22 Nov 2006 13:33:53 +0100
For a long time, I didn't used regular expressions (REs), except perhaps 
in simple cases with Unix tools.
Then, I (re-)discovered them in Lua, and some simple examples shown me 
they are useful and not so hard to use.
Then SciTE, my favorite editor, added them and I found myself using them 
more and more.
And I started using them in programming, with PHP, JavaScript, Java, and 
recently AutoHotkey (which added PCRE support).
I even looked at various engines (SciTE's one, simple and easy to 
follow, Henry Spencer's ones (more complicated, less readable), Gnu's 
one (Java, bloated!), PCRE's one (very complex), etc.) and I hacked a 
bit SciTE's engine to add support for \d \s \w \xHH notations (to be 
submitted).
I know that Lua's RE engine is hand-made by Roberto (IIRC) and intently 
kept simple and small. That's why it has no alternatives (foo|bar) which 
is annoying but we can live without that, nor advanced bells and 
whistles like lookaround assertions.
I recently wrote in AutoHotkey a program to parse a script in this 
language, to extract a list of functions definitions.
It is a simple automaton, using simple REs to match expected syntax in 
each line. So I thought it would be easy to rewrite it in Lua, so I can 
use it in SciTE.
Alas, it did not worked, as I forgot an important limitation: repetition 
symbols apply only to character classes, not to sub-patterns!
So to take a classical stupid example, I can't write (ab)* to match 
ababab...
And, more practically, I can't write something like: %s*(%s;.*)?$
Nor: (%)%s*(%{)?)?
The (%})? can be rewritten as (}?) of course, but still I can't write 
the complete expression.
I can, and will, workaround this, searching after the last match and so 
on. Yet, this is frustrating, adding complexity to the script.
So, the point of my message is: is there a compelling reason for such 
limitation?
I can understand reasons like lack of time for making better engine, or 
such feature would have grown the engine too much.
And the second point is: would the Lua team accept a patch in this 
domain, for the future v.5.2?
If that's 'no', I won't even try. Instead of hacking, I would write a 
wrapper for PCRE, for example. But it wouldn't be usable in SciTE, for 
example.
If that's 'maybe', I can take a look, no promise done, but I believe I 
have some time before 5.2 is out anyway...
I can accept that my patch is rejected if badly written, buggy or too 
big, but I won't spend time there if rejected by principle.
Well, if patches are accepted, I might attempt as well to implement 
repetition ranges, like {m,n}.
Alternatives are out of question, as it would need to rewrite completely 
the engine...
BTW, I found myself wishing to have a continue keyword to use in the 
parsing loop, avoiding excessive 'if' nesting and indenting.
I recall having read many times debates on its usefulness, and still 
can't recall an official reason why it isn't in the language. I am too 
lazy (read 'no time for that') to search the mailing list archive.
Perhaps somebody will be courageous / bored enough to dig out the 
reasons / arguments and put them in the Faq on the Wiki.
Would be useful too for the global-by-default / local-by-default 
discussions, and perhaps the ++ -- += -= bitwise-operators... :-)
--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --