[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Matching sequences of identical characters with lpeg.
- From: Andrew Gierth <andrew@...>
- Date: Tue, 20 Nov 2018 22:51:41 +0000
>>>>> "Magicks" == Magicks M <firstname.lastname@example.org> writes:
Magicks> Hi, I'm having some trouble with this.
Magicks> I want to capture seqences on identical characters in a
Magicks> string: "aaabbbcccd" -> "aaa" "bbb" "ccc" "d"
Magicks> I attempted to use the regular pattern "((.)%1*)" and found
Magicks> that quantifiers didn't work. I then switched to lpeg and
Magicks> attempted to use: Cg(P(1), "char") * Cb'char'^0 Which errors,
Magicks> and thus I cannot think of a good way to match this.
Doing this in lpeg will, I believe, either require Cmt or pre-generating
a pattern for every possible character. (Your attempt above seems to
misunderstand what Cb does - it just fetches and returns the value from
Cg, it does not attempt to match it against the subject string;
backreference matches (as with =foo in the lpeg "re" module, or the "Lua
long strings" example in the lpeg docs) require Cmt.)
It's not immediately clear that doing it with Cmt would be in any way
better than just open-coding the search in lua, since you'd be calling
the capture function at every character position. With pre-generated
patterns it might look like this:
local lpeg = require "lpeg"
local P, C = lpeg.P, lpeg.C
local subpat = P(false)
for i = 0,255 do
subpat = subpat + P(string.char(i))^2 -- 2 or more occurrences
local pat = (C(subpat) + P(1))^0
-- output: aa bb cc aaa bbb ccc