lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi:

On Tue, Mar 4, 2014 at 3:36 PM, Pierre-Yves Gérardy <pygy79@gmail.com> wrote:
>> Would you care to elaborate?
>      s:find(p, -3) --> 5, 7

I was talking of optimizing AFTER the first negative index fold, that
is, AFTER you do if i<0  i=+length+1 in s.find paramers, this example
is the same as s:find(p,5), it does not need the clipping to 1.

Aside from that, what really has left me shocked is the fact that
s:find(p,5) gives 5,7.

In fact you've described it quite well:

> "^" triggers a search anchored to the index, not the start of the string.
> You said that skipping negative indices was a correct optimization,
> that's true for "free roaming" patterns, but it doesn't hold if they
> are anchored. There's nothing to find anchored to an index smaller
> than 1.

And lua seems to work these way, but I've never worked with regexes
which do this. It seam it treats

s.find(p,5) as   5 + s.sub(5).find(p)

Which I'll never expect, specially after reading:

Pattern:

A pattern is a sequence of pattern items. A caret '^' at the beginning
of a pattern anchors the match at the beginning of the subject string.
A '$' at the end of a pattern anchors the match at the end of the
subject string. At other positions, '^' and '$' have no special
meaning and represent themselves.

IMO this behaviour needs a note in the manual ( perhaps defining
'subject string' which appears three times in the doc, the two I've
quoted and the next paragraph ).

I'm used to languages with more matching support, and to iterate
matches. This is specially important when matching unknown patterns,
i.e. in perl you have this:

folarte@paqueton:~/tmp$ perl -MData::Dumper -e 'print Dumper [
"12ABC12EFG" =~ /^12./g ];'
$VAR1 = [
          '12A'
        ];
folarte@paqueton:~/tmp$ perl -MData::Dumper -e 'print Dumper [
"12ABC12EFG" =~ /12./g ];'
$VAR1 = [
          '12A',
          '12E'
        ];

And I'm not even entering into multiline matches:

folarte@paqueton:~/tmp$ perl -MData::Dumper -e 'print Dumper [
"12ABC12EFG\n12HIJ12KLM" =~ /^12./mg ];'
$VAR1 = [
          '12A',
          '12H'
        ];

And others languages I've done find/match in give exactly the same
result, ^ means start of string, as I interpreted lua doc, and if you
try to match it after the start, it never matches ( which is the
correct thing, and is needed to properly chain matches, if I make a
program which counts the occurrences of a pattern on a file, with a
single line L="AAA" I would expect to find "^A" once and "A" thrice,
but it seems this needs careful coding in lua.


I mean, I would do that with something like this in lua:
folarte@paqueton:~/tmp$ cat find.lua
function f(s,p)
  print(s,p)
  local start=1
  while start <= #s do
     start,fin=s:find(p,start)
     if (start==nil) then return; end
     print(start,fin,s:sub(start,fin))
     start=fin+1
  end
end

f("12ABC12DEF","^12.")
f("12ABC12DEF","^12.")
f("12A12DEF","^12.")
f("AAA","A")
f("AAA","^A")
folarte@paqueton:~/tmp$ lua find.lua
12ABC12DEF    ^12.
1    3    12A
12ABC12DEF    ^12.
1    3    12A
12A12DEF    ^12.
1    3    12A
4    6    12D
AAA    A
1    1    A
2    2    A
3    3    A
AAA    ^A
1    1    A
2    2    A
3    3    A

Well, one more thing to look after in lua, go at least for lpeg for
any serious pattern matching.

Francisco Olarte.