• Subject: LPeg can do what Lua patterns can't (was Re: lpeg as a part of lua)
• From: Sean Conner <sean@...>
• Date: Sun, 29 Oct 2017 00:38:05 -0400

```It was thus said that the Great Dirk Laurie once stated:
>
> Put another way: the issue is not what can you do with LPeg, it is
> what can you do that you can't do with Lua patterns, and whether
> that extra is sufficiently common to justify adding LPeg to Lua.

I think I've come across something else that LPeg can easily do (for
various values of "easily") that would be difficult to do with Lua patterns.
The background for the project is to output text to fit the width of a
terminal, and said output contains a mixture of UTF-8 and terminal escape
codes, for example:

Стоял он, дум\27[31;41m великих полн"

That string is 56 bytes long, and contains 26 printable glyphs.  If I wanted
to print out only 20 glyphs (because that's the width of our terminal, or
all that's left on the current line of our terminal), how do I calculate how
many bytes to write?

So I have a solution I wrote in LPeg.  What I'm presenting below is a
simplified version (the full version covers UTF-8, the C0 [1] and the C1 [2]
control sets) so as to avoid complicated explanations of how it works.

local lpeg = require "lpeg"

local Carg = lpeg.Carg
local Cmt  = lpeg.Cmt
local Cb   = lpeg.Cb
local Cc   = lpeg.Cc
local Cg   = lpeg.Cg
local Cp   = lpeg.Cp
local C    = lpeg.C
local P    = lpeg.P

local function cmt(_,position,sum,max,count)
if sum < max then
return position,count
end
end

local function cf(sum,count)
return sum + count
end

local char  = Cc(1) * P"<hello>"
+ Cc(0) * P"-" * (P(1) - P"-")^0 * P"-"
+ Cc(1) * P(1)

local char2 = Cmt(Cb'count' * Carg(1) * char,cmt)

local len   = Cg(Cc(0),"count")
* (Cg((Cb 'count' * char2) / cf,'count'))^0
* Cb 'count' * Cp()

local test    = "he<hello>llo_-this is a comment-there_<hello>_how"
local max,pos = len:match(test,1,15)

print(max,pos)
print(test)
print(test:sub(1,pos-1))

The 'char' production includes a multibyte sequence we want to treat as a
"single character", a multibyte sequence (of variable length no less) we
want to treat as "zero characters" and your more usual one byte per
character sequence.  I'm using Cc() to return the "size in characters".

Skipping ahead a bit to the 'len' production:

Cg(Cc(0),"count")
Set a "variable" we're calling "count" to 0

(Cg((Cb 'count' * char2) / cf,'count'))^0
Working out way out, we retrive our variable 'count', and
run a match of 'char2' (more below)---we then pass these two
values through the funciton cf(), which just adds the
returned count from 'char2' to the runnim sum in 'count'.
This result (as a capture) is then reassigned back to the
variable 'count'.  This keeps going until "char2" fails (see
below).

Cb 'count' * Cp()
Here we return our count and the current position in the
match.

The middle production, "char2":

Cmt(Cb 'count' * Carg(1) * char,cmt)

Again, working our way out, we retrive our variable 'count',
and the first addtional argument to lpeg.match(), which here
is the number of "glyphs" we want, and the next "character"
and pass all three to the cmt() function.  This just checks
that are running sum is less than the maximum and if so, we
keep going.  Otherwise it returns nil, meaning we are at the
end of this pattern.

It works, and I find it quite concise and to the point.  My question to
those of you in the Lua pattern camp, what pattern(s) would you have to
write to solve this problem?  I'm curious.  Is it even possible with Lua
patterns?

-spc

[1]	Charcters values defined by ANSI, \0 to \31 and \127.

[2]	Defined by ISO/IEC 2022 and most commonly known as ANSI escape
codes.

```

• Follow-Ups: