[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lpeg for one who is regexp-poisoned...
- From: Sean Conner <sean@...>
- Date: Wed, 12 Nov 2014 12:48:00 -0500
It was thus said that the Great Andrew Starks once stated:
> On Tue, Nov 11, 2014 at 9:58 PM, <meino.cramer@gmx.de> wrote:
>
> > I looked (and tried to understand ...) Robertos video about the
> > concepts of lpeg/peg on Youtube and I read the lpeg tutorial
> > lpeg pages.
> >
> > But still I am running against an inner wall...years of using regexps
> > could not be nullified that fast... ;)
> >
> > One sentence of Robertos video stick in my head: Lpeg do not search.
> >
> > How can I recognize the appearance of a certain pattern in a string
> > then?
> >
> > For example:
> >
> > mystinrg="CHaskellLispFortranBPCLAssemblerForthLuaSchemePerlCHILLTeXJavaJavascript"
> >
> > Is it possible to check with LPeg and without missusing it , wheter "Lua"
> > is in that string and where (I know, that there are other lpegless methods
> > to
> > do that ... thats why this is an example ;) ?
> >
> > If this not possible or only can be "tricked"...how can LPeg react on
> > only partly known input formats?
> >
> > Or is that a typical question of someone, who is still under the bad
> > influence of regexps?
>
> Yes you can search. With lpeg, there are at least two methods that I can
> think of. The one that doesn't use grammars can be described thusly:
>
> zero or more characters that do not equal "Lua" followed by "Lua"
>
> or
>
> (untested)
>
> local P = lpeg.P
>
> local contains_lua_pat = (P(1) - "Lua")^01 * P("Lua")
>
> print(contains_la_pat:match(your_string))
> --> position of the match
That actually returns the position *just past* the match, so there's
little indication of what you matched, just that you did.
> If it succeeds, it returns the position of... i believe the first (and
> maybe also last) position of the match. If you use "lpeg.C", you'll get the
> capture.
Here's a sample with captures:
-- ----------------------------------------------------------------
-- load LPeg, and grab some local references to two LPeg functions:
--
-- C() - Return the text comprising the pattern [1]
-- Cp() - Return the current position in the string
--
-- [1] It can return more than just the pattern text, but for now,
-- this explanation is Good Enough.
-- ----------------------------------------------------------------
lpeg = require "lpeg"
local Cp = lpeg.Cp
local C = lpeg.C
-- -------------------------------------------------------------------------
-- Try to match against a list of languages. Becuase of the way LPeg works,
-- the search will first try "Haskell", then "Lisp", then the next one. In
-- this example, it's best to search for longer terms before shorter ones.
-- If you had "Java" then "Javascript", then searching for "Javascript" will
-- return "Java", since "Java" will be found first. To avoid this, use the
-- order "Javascript", "Java".
--
-- It's this reason that "C" is searched for last in the list.
--
-- This will also return the position just past the match so we can resume
-- searching the string past what we've matched.
-- -------------------------------------------------------------------------
lang = (
C("Haskell")
+ C("Lisp")
+ C("Fortran")
+ C("BPCL")
+ C("Assembler")
+ C("Forth")
+ C("Lua")
+ C("Scheme")
+ C("Perl")
+ C("CHILL")
+ C("TeX")
+ C("Javascript")
+ C("Java")
+ C("C")
) * Cp()
test = "CHaskellLispFortranBPCLAssemblerForthLuaSchemePerlCHILLTeXJavaJavascript"
-- -------------------------------------------------------------------------
-- Start at the first position in the string (remember: Lua is 1-based). Get
-- the language at that position, plus the position past the language name.
-- Print the language, then resume searching for other language names.
-- -------------------------------------------------------------------------
pos = 1
while pos <= #test do
local name,newpos = lang:match(test,pos)
if not name then break end
print(name)
pos = newpos
end
-spc