lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Jul 16, 2015 at 3:04 AM, John Hind <john.hind@zen.co.uk> wrote:
> I suggest adding '%^' and '%$' as character classes.
>
> So, for example, a parameter list can be patterned with:
>
> "%s*(%w+)%s*[,%$]"
>
> I think this is both conceptually and in implementation simpler and cleaner
> than the current definition, not an "advanced feature" at all.

What is the purpose of the initial '%s*' ?

Or, to be precise, on what subject will John's suggested pattern
behave differently from:

"(%w+)%s*[,%$]"

As Roberto mentioned, why not use:

for s in ('a, b, c d') : gmatch '(%w+)%s*%f[,\0]' do  print ( s ) end

The above will print a, b and d (but not c).  Exactly the same result
as John's suggested pattern.  (Although perhaps not the result John
wants.)

In general, it seems to me that John is proposing that Lua's pattern
system be extended to incorporate a form of branching.  In PCRE,
John's suggestion could be written as '(%w+)%s*(,|$)'.  This uses
PCRE's branching feature (also called "alternation").

I seem to remember reading somewhere that by design Lua's (minimalist)
patterns intentionally do not support branching.  So it feels to me
that what John is suggesting goes against the intentions behind the
design of Lua's pattern matching.

Additionally, as I see it, the only difference between
"%s*(%w+)%s*[,%$]" and the very simple and clear '%w+' is that the
former will fail to match some parts of (invalid?) input strings that
the latter will (incorrectly) match.  But as these (desirable?)
failures will be silent, they cannot be used to robustly validate
untrusted input.

I would argue that validating while parsing is indeed an "advanced
feature", and I would use LPeg to do it.

-Parke