lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Jul 26, 2015 at 7:24 PM, Soni L. <fakedme@gmail.com> wrote:
> Lua patterns are too complex.
>
> ^[^()]?([a-z]-$.*)$
>
> ^ was repeated twice, with 2 different meanings
> $ was repeated twice, again with 2 different meanings
> [] was repeated twice, although with the same meaning

All of this would be precisely the same in a normal regular
expression. Keeping them the same benefits programmers coming from
other languages who are used to full regex instead of Lua patterns.


> ( was repeated twice, with 2 different meanings

While the first meaning is unique to Lua, many regex flavors provide
means to access the start and end positions of a text capture, so the
empty capture producing a string position both fills that hole and is
arguably more powerful, since arbitrary positions can be easily
captured without having to capture possibly unneeded text also.

Unless you have an alternative idea for how to capture a string
position, the use of an empty but otherwise ordinary capture works
fine for me.


> - was repeated twice, with 2 different meanings

In the absence of PCRE's ? to make quantifiers lazy, the added second
meaning here is needed. Consider this for example: [1]

    test = "int x; /* x */  int y; /* y */"
    print(string.gsub(test, "/%*.*%*/", "<COMMENT>"))
      --> int x; <COMMENT>

versus:

    test = "int x; /* x */  int y; /* y */"
    print(string.gsub(test, "/%*.-%*/", "<COMMENT>"))
        --> int x; <COMMENT>  int y; <COMMENT>

So a lazy quantifier is needed in some form. And IMHO it's pretty
clear what is meant each time. I've never looked at a Lua pattern and
been confused about which meaning to ascribe to a - symbol.

And both this use of - and the use of () are clearly and concisely
explained in the reference manual, so it's not like it's hard for a
newbie to find out what these mean when they encounter them.


> + - * and ? can be simplified to + - and ?

Without *, how would you write a pattern to match a character class
zero or more times, but as many times as possible? Consider: [2]

    function trim (s)
      return (string.gsub(s, "^%s*(.-)%s*$", "%1"))
    end

%s* must match ALL leading whitespace, so that no whitespace is
captured by (.-).

[1] http://www.lua.org/pil/20.2.html
[2] http://www.lua.org/pil/20.3.html