Re: Any LPEG tutorial for laymen ?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Any LPEG tutorial for laymen ?
From: Sean Conner <sean@...>
Date: Wed, 25 Sep 2013 01:28:58 -0400

It was thus said that the Great David Crayford once stated:
> On 24/09/2013 8:51 PM, Luiz Henrique de Figueiredo wrote:
> >>Take the following output from a netstat command.
> >>
> >>Client Name: SMTP                     Client Id: 000000B7
> >[...]
> >>I would love to learn how to write LPeg parser to yank the key->values
> >>from that multi-line report easily.
> >You don't need LPeg for this task. Try
> >	for k,v in T:gmatch("(%u[%w ]-):%s*(.-)%s") do print(k,v) end
> >where T contains the netstat output.
> 
> Thanks. This is how dumbstruck I am WRT pattern matching. I want to 
> parse the following piece of netstat output
> 
> SKRBKDC  00000099 UDP
>   Local Socket:   172.17.69.30..464
>   Foreign Socket: *..*
> 
> The top line is the user, connection id and state. All I want to do is 
> capture three whitespace seperated words.
> 
> In REXX I would do this:
> 
> parse var line userid connid state
> 
> What is the most succinct way of doing something similar in Lua?

  Using LPeg:

lpeg = require "lpeg"	-- load up the module

-- this defines whitespace.  It's just a space (ASCII 32).
-- alternatively, you can define it as:
--
--  SP = lpeg.S" \t"
--
--  Which defines whitespace as a set of characters (ASCII 32
--  and ASCII 9).

SP   = lpeg.P" "

-- This defines a word.  It's basically, at least one character (lpeg.P(1))
-- that is NOT a space (- SP).  The "^1" is a loop operator of LPeg and here
-- it means "one or more".  "lpeg.C()" is the capture function, and this is
-- what "captures" (or returns) what we are interested in.

word = lpeg.C( (lpeg.P(1) - SP)^1 )

-- And our line, which is three space separated words.  In order to account
-- for multiple spaces, we use the loop operator on the whitespace.  The
-- first bit, "SP^0" means "0 or more whitespace characters at the start of
-- the line."  The "*" here can be read as "and", so translated, "optional
-- white space and a word and some space and a word and some space and a 
-- word."

line = SP^0 * word * SP^1 * word * SP^1 * word

--  That's it for the parsing.  This function just takes a line of text, and
-- splits it into three separate words.  Right now, we just print them one
-- to a line, but the code could return all three or do whatever.

function parse(text)
  local w1,w2,w3 = line:match(text)
  print(w1)
  print(w2)
  print(w3)
  print()
end

-- And some tests ... 

parse "SKRBKDC  00000099 UDP" 
parse "  Local Socket:   172.17.69.30..464" 
parse "  Foreign Socket: *..*"

  -spc

Follow-Ups:
- Re: Any LPEG tutorial for laymen ?, Andrew Starks

References:
- Any LPEG tutorial for laymen ?, Jayanth Acharya
- Re: Any LPEG tutorial for laymen ?, David Crayford
- Re: Any LPEG tutorial for laymen ?, Luiz Henrique de Figueiredo
- Re: Any LPEG tutorial for laymen ?, David Crayford

Prev by Date: Re: Any LPEG tutorial for laymen ?
Next by Date: Re: Any LPEG tutorial for laymen ?
Previous by thread: Re: Any LPEG tutorial for laymen ?
Next by thread: Re: Any LPEG tutorial for laymen ?
Index(es):
- Date
- Thread