lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On 26/07/15 09:13 PM, Jonathan Goble wrote:
On Sun, Jul 26, 2015 at 7:24 PM, Soni L. <> wrote:
Lua patterns are too complex.


^ was repeated twice, with 2 different meanings
$ was repeated twice, again with 2 different meanings
[] was repeated twice, although with the same meaning
All of this would be precisely the same in a normal regular
expression. Keeping them the same benefits programmers coming from
other languages who are used to full regex instead of Lua patterns.

( was repeated twice, with 2 different meanings
While the first meaning is unique to Lua, many regex flavors provide
means to access the start and end positions of a text capture, so the
empty capture producing a string position both fills that hole and is
arguably more powerful, since arbitrary positions can be easily
captured without having to capture possibly unneeded text also.
The "first meaning" is actually just a literal (, because () in sets are considered literals (thus special-cased)

Unless you have an alternative idea for how to capture a string
position, the use of an empty but otherwise ordinary capture works
fine for me.

- was repeated twice, with 2 different meanings
In the absence of PCRE's ? to make quantifiers lazy, the added second
meaning here is needed. Consider this for example: [1]
We don't need a zero-or-more. We could use : instead of - for ranges and it'd still make sense.

Instead of having zero-or-more we could have ? modify + and - for 0 or 1 e.g. +? will try to match with a + and if that fails will try a 0 or more.

     test = "int x; /* x */  int y; /* y */"
     print(string.gsub(test, "/%*.*%*/", "<COMMENT>"))
       --> int x; <COMMENT>


     test = "int x; /* x */  int y; /* y */"
     print(string.gsub(test, "/%*.-%*/", "<COMMENT>"))
         --> int x; <COMMENT>  int y; <COMMENT>

So a lazy quantifier is needed in some form. And IMHO it's pretty
clear what is meant each time. I've never looked at a Lua pattern and
been confused about which meaning to ascribe to a - symbol.

And both this use of - and the use of () are clearly and concisely
explained in the reference manual, so it's not like it's hard for a
newbie to find out what these mean when they encounter them.

+ - * and ? can be simplified to + - and ?
Without *, how would you write a pattern to match a character class
zero or more times, but as many times as possible? Consider: [2]

     function trim (s)
       return (string.gsub(s, "^%s*(.-)%s*$", "%1"))

%s* must match ALL leading whitespace, so that no whitespace is
captured by (.-).
%s+? (where ? is the good old ? you're used to, applied to %s+, where + is the good old + you're used to)

We also don't need ? if we allow empty alternations: the regex (t|) would be equivalent to t?, and (|t) would be a non-greedy t?


Look at ClEx[1] and you'll see most of the things you're used to can be coded in a much simpler way, removing parser complexity.


Disclaimer: these emails are public and can be accessed from <TODO: get a non-DHCP IP and put it here>. If you do not agree with this, DO NOT REPLY.