lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Dirk Laurie once stated:
> 
> Put another way: the issue is not what can you do with LPeg, it is
> what can you do that you can't do with Lua patterns, and whether
> that extra is sufficiently common to justify adding LPeg to Lua.

  I think I've come across something else that LPeg can easily do (for
various values of "easily") that would be difficult to do with Lua patterns. 
The background for the project is to output text to fit the width of a
terminal, and said output contains a mixture of UTF-8 and terminal escape
codes, for example:

	Стоял он, дум\27[31;41m великих полн"

That string is 56 bytes long, and contains 26 printable glyphs.  If I wanted
to print out only 20 glyphs (because that's the width of our terminal, or
all that's left on the current line of our terminal), how do I calculate how
many bytes to write?

  So I have a solution I wrote in LPeg.  What I'm presenting below is a
simplified version (the full version covers UTF-8, the C0 [1] and the C1 [2]
control sets) so as to avoid complicated explanations of how it works.

	local lpeg = require "lpeg"
	
	local Carg = lpeg.Carg
	local Cmt  = lpeg.Cmt
	local Cb   = lpeg.Cb
	local Cc   = lpeg.Cc
	local Cg   = lpeg.Cg
	local Cp   = lpeg.Cp
	local C    = lpeg.C
	local P    = lpeg.P
	
	local function cmt(_,position,sum,max,count)
	  if sum < max then
	    return position,count
	  end
	end
	
	local function cf(sum,count)
	  return sum + count
	end
	
	local char  = Cc(1) * P"<hello>"
	            + Cc(0) * P"-" * (P(1) - P"-")^0 * P"-"
	            + Cc(1) * P(1)
	
	local char2 = Cmt(Cb'count' * Carg(1) * char,cmt)
	
	local len   = Cg(Cc(0),"count")
	            * (Cg((Cb 'count' * char2) / cf,'count'))^0
	            * Cb 'count' * Cp()
	
	local test    = "he<hello>llo_-this is a comment-there_<hello>_how"
	local max,pos = len:match(test,1,15)
	
	print(max,pos)
	print(test)
	print(test:sub(1,pos-1))

The 'char' production includes a multibyte sequence we want to treat as a
"single character", a multibyte sequence (of variable length no less) we
want to treat as "zero characters" and your more usual one byte per
character sequence.  I'm using Cc() to return the "size in characters".

Skipping ahead a bit to the 'len' production:

	Cg(Cc(0),"count")
		Set a "variable" we're calling "count" to 0

	(Cg((Cb 'count' * char2) / cf,'count'))^0
		Working out way out, we retrive our variable 'count', and
		run a match of 'char2' (more below)---we then pass these two
		values through the funciton cf(), which just adds the
		returned count from 'char2' to the runnim sum in 'count'. 
		This result (as a capture) is then reassigned back to the
		variable 'count'.  This keeps going until "char2" fails (see
		below).

	Cb 'count' * Cp()
		Here we return our count and the current position in the
		match.

The middle production, "char2":

	Cmt(Cb 'count' * Carg(1) * char,cmt)

		Again, working our way out, we retrive our variable 'count',
		and the first addtional argument to lpeg.match(), which here
		is the number of "glyphs" we want, and the next "character"
		and pass all three to the cmt() function.  This just checks
		that are running sum is less than the maximum and if so, we
		keep going.  Otherwise it returns nil, meaning we are at the
		end of this pattern.

  It works, and I find it quite concise and to the point.  My question to
those of you in the Lua pattern camp, what pattern(s) would you have to
write to solve this problem?  I'm curious.  Is it even possible with Lua
patterns?

  -spc

[1]	Charcters values defined by ANSI, \0 to \31 and \127.

[2]	Defined by ISO/IEC 2022 and most commonly known as ANSI escape
	codes.