lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi:

On Tue, Mar 4, 2014 at 11:58 PM, Pierre-Yves Gérardy <pygy79@gmail.com> wrote:
> On Tue, Mar 4, 2014 at 6:33 PM, Francisco Olarte <folarte@peoplecall.com> wrote:
>> Aside from that, what really has left me shocked is the fact that
>> s:find(p,5) gives 5,7.
> It is useful to parse a string piecewise. You can do it with an
> unanchored pattern, by checking that the start of the match is equal
> to the index passed to `.find`. It is wasteful, though, because `find`
> may walk the whole subject if it doesn't match. It is even more
> wasteful if you want to use `match`, since you must first use `find`
> to be sure that the match occured at the desired place.

Yes, it maybe useful, and I'll assume the behaviour has been done this
way due to its usefulness. As the other thing is useful once you start
thinking on functions which get their regexp as parameter and do not
want to preparse it ( like my example, write a funcion which counts
how many non overlapping times a pattern matches in a line, single
line to avoid issues with wether internal newlines count). I do not
know how to make this work with patterns without preparsing the
pattern, but making the opposite is trivial, just match on sub from
the last match.

> Likewise, LPeg patterns are anchored to the index passed to `lpeg.match(...)`.

Nice to now.

> PCRE has a flag (PCRE_ANCHORED) to trigger that behavior in patterns
> without the caret.

Yes, but my point was I've never imagined 'the subject string' would
mean 'the range between init and end' without an explicit
definition.To me a call s.find('^',n) should always fail with n>1, I
mean, I've always read s:find('^',2) as 'find the start of the string
between the second char and the end. Ok, not found. Like if you tell
me 'find a start of day in tomorrow between 12:00 and 14:00', OK, not
found.

> Lua patterns are not regular expressions. They are close, in some
> respects, but you shouldn't expect them to work likewise.

These I know, it's easily noticeable in the lack of alternances,  but
I think ^ ( and $, ., * , +  ) has been taken from common regexes,
which I think is grat as many people know them, but this is, IMO, a
dessign to trap people. I carefully look at every % in a pattern, but
I assumed thinks copied from regexes where copied. Anyway, anytime
where I need heavy duty regexp text processing I normally take perl,
which has them as a primitive type. But I thought I could port some
simplke stuff to lua, it seems that needs rethinking. Anyway, should I
need regex, I suppose I could find a rock for pcre or write one,
doesn't seem like a particularly hard stuff.

> That being said, I agree that the documentation could be improved.
> You may be interested in these features that AFAIK don't have an
> equivalent in regexen.

> - %bXY, the balanced pair pattern: "%b()" will match "( (a, b) ( (oo)() ) )"

These can be done in perl, I think, with recursive patterns, but are
much easier here. Normally whenever I have to parse one of this things
I need to keep track of the nesting levels, so I end up using a
regexp-based tokenizer and feed them to a state machine ( last time
was processing config files similar to nested table dumps ). The point
is all parsers I've made with balanced stuff always have a way to
escape the bracket, or have quoted strings in the middle, so I always
need an automata ( althoughs these days I lean towards executable
chunks for all this stuff, it seems they are always config files, and
they work in passably in perl and great in lua ).


> - %f[charset], the undocumented frontier pattern, which is a bit too
> long to describe here. More here:

Yeah, this is nice. Perl has some more similar transition patterns,
but the thing is I've used it, say, 2 times in 20 years, and it was in
one liners.

Francisco Olarte.