lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


A few questions about lpeg...

I still don't understand lpeg very well, and I have the (naive?)
impression that patterns-with-captures are implemented on top of
patterns-without-captured in a way that even allows "projecting" a
pattern-with-captures into the lowel level, by discarding all the
information about captures... also, matching a pattern-with-captures
involves some backtracking, and some operations on the captures - like
"patt / function" - should only be performed after the (super)pattern
succeeds; so, in a first moment lpeg.match keeps backtracking
information and instructions for performing captures; at some point
the pattern is "closed", the backtracking information is dropped, and
the instructions for performing captures are executed...

Is that mental model correct? Is there a way to force a subpattern to
be closed, and its captures performed?



Now let me show why I stumbled on that question, and why I was
somewhat surprised when I discovered that the execution of the
function in "patt / function" is delayed.

I am trying to htmlize some files that have lots of "Elisp hyperlinks"
embedded in comments. For example, in

 # (info "(bash)Shell Parameter Expansion")

the "(info ...)" can be used as a hyperlink inside Emacs - executing
it as Lisp opens a page of the Bash manual. Not all sexps are
hyperlinks, and only a few of the sexps that work as hyperlinks inside
Emacs can be htmlized in meaningful ways. I have a table whose keys
are the symbols that can be heads of htmlizable hyperlink sexps, and I
was trying to build a pattern that would fail immediately when it
noticed that it was processing a sexp that is not htmlizable.

My first attempts to build patterns that would match only the "head
symbols" were more or less like this (I'm reconstructing that from
memory - it didn't work...):

   SSymbol = lpeg.R("AZ", "az", "09") + lpeg.S("-+_")

   headsymbols = { ["info"]=true, ["man"]=true }

   setsymbol = function (str) symbol = str end
   isheadsymbol = function (subj, pos)
       return headsymbols[symbol] and pos
     end

   SHeadSymbol = (SSymbol / setsymbol) * lpeg.P(isheadsymbol)

but then I discovered that the the "/ setsymbol" part was being
executed after the "lpeg.P(isheadsymbol)", not before...

My current solution (which works!) is like this - again, I'm
reconstructing this from from memory; the real implementation is more
complex:

   SSymbol = lpeg.R("AZ", "az", "09") + lpeg.S("-+_")

   headsymbols = { ["info"]=true, ["man"]=true }

   setmark = function (subj, pos)
       mark = pos
       return pos
     end
   isheadsymbol = function (subj, pos)
       local symbol = string.sub(subj, mark, pos - 1)
       return headsymbols[symbol] and pos
     end

   SHeadSymbol = lpeg.P(setmark) * SSymbol * lpeg.P(isheadsymbol)



Cheers, more later, thanks in advance, etc,

 Eduardo Ochs
 eduardoochs@gmail.com
 http://angg.twu.net/