lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Am 06.03.2014 09:34 schröbte Francisco Olarte:
Hi:

Hi!



On Wed, Mar 5, 2014 at 8:25 PM, Philipp Janda <siffiejoe@gmx.net> wrote:

It seems that you want to write a pattern matching function that looks
atomic on the outside but is implemented as a loop on the inside.

I wanted to be able to write lua pattern code without having to look
at a manual page and carefully examine the pattern everytime.

That's doable, Lua patterns are a lot less complex than e.g. Perl's regexes. The problem IME is to figure out whether something can be done with a Lua pattern at all ...


And, if you want to do lua-style anchoring, you have a meta for this:

folarte@paqueton:~$ perl -MData::Dumper -e 'print Dumper [
"12ABD12DEF" =~ /\G12./g ]'
$VAR1 = [
           '12A'
         ];

I didn't know that one. So basically Lua's `^` anchor combines Perl's `^` and `\G` assertions into one.


It's difficult to beat perl for a sequence of pattern matches,
specially the kind coded into the program with lots of meta, this is
where it really shines. You mainly have problems if you think in a
C/java/lua way an try to work via indexes.

If
applying an anchored pattern to somewhere other than index 1 always failed,
doing so would be foolish in the first place, and it probably only happens
by accident.

Given your previous ' "always fail" ' comments and these I do not know
if you've read what I wrote or choose to deliberatlely ignore it or
what. I'll opt for thinking I do not explain myself well and do it
again.

ANY of the examples we have been putting, with constant strings, is
foolish, we could substitute for the constant return value.

In my code, the pattern is most often a constant, and I also know if I pass an explicit starting index to `string.find`. So the "foolish" also applies to code where the target string isn't a constant. Since we are still arguing, your pattern probably isn't constant ...


What I was trying to illustrate is that a fuction which gets an input
pattern and tries to do an offseted find will surprise anyone familiar
with how patterns in a lot of other language work, specially on some
patterns, like '^ *#', which are the same as a regular expression, and
I fear they are the same by dessign ( I mean it seems like the Lua
team picked thos chars because they knew them from working with
regexps ). And they work in a similar way in lot of languages for a
good reason, patterns / regexps are a mini language on its own, having
mainly similar but subtly different ones will confuse the hell out of
users.

I fear this ship has sailed. You can install five different regex engines via LuaRocks, there are Lua patterns, on Unixes there is POSIX regular expressions vs. Extended POSIX regular expressions, etc.


I see how these may be useful, it's why the perl folks put a \G in
their regexp spec. I do not have a problem with something different
enough, like lpeg, to force me look at the docs. But I have a problem
due to surprising behaviour. Is like if someone makes a language
where, being addition more common than substraction, they use - to add
and + for sub because - is statistically easier to type in most
keyboards,it will surprise the heell out of the rest of the
programmers ( Hint: this is an intentionally exagerated and ridiculous
sample, not a propossal or a comment on your code or proposals ).

No worries ...


We also seem to agree that Lua's behavior covers additional use
cases. Given all that, the current implementation makes sense. A clarifying
note in the reference manual wouldn't hurt, though, and there is a manual
update in the pipeline anyway ...

No, lua covers different use cases. If you read my texts, I consider,
unless someone proves me wrong, the other langauages behaviour
superior for lua. If I have a Cfind function with the current
behaviour and a Ffind function with the other language behaviour I can
obtaing S:Cfind(p,i) as s:sub(i):Ffind(p). The other way round is a
little more complex, as you need a conditional to test the pattern.

Only if the pattern isn't a constant in the first place, which is the case if you use a user-supplied pattern (e.g. when implementing a text editor, or a `grep`-replacement), or you want to implement an additional pattern matching function yourself (which I assumed you were doing). And yes, the starting index thing is mainly an optimization that avoids dissecting the subject string for offset matches.

Of course, anyone needing this in a serious program will wrap
everything in a function. ( Hint: I consider, that means this is an
opinion )

On the makes sense stuff, things make sense to some people and does
not to other. Given the background of the lua team and my study of
their work I've always considered it makes sense to at least one of
them.

And this is not going to hit me hard, as given my bacground lua will
not even get to top five as the language I'll choose if I need to code
a matching heavy thing.

For matching heavy things I skip regexes and go straight for LPeg now.

The clarifying stuff would be great, as this
is not the first time I've found the manual slighltly underspecified,
although this make sense to me ( lua is young, it is not that widely
used in places where that matters, target userbase can live with this
and the manual is improving nicely ).


return nil, or that I sanitize every pattern I use to see wether it
starts with ^ or what?
Yes, for your current use case (if I guessed correctly) that's one way to
go.
Well, these lines alone proves ( to me ) I'm not able to convey
information to you.

That's unfortunate. I just tried to figure out how Lua's anchor behavior could be a problem for you, and the hints (non-constant pattern, repeated `find` calls) led me to the above-mentioned guess.


Francisco Olarte.


Philipp