lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great dan young once stated:
> Hey Sean,
> 
> Thanx again for the lpeg lesson....to build on this a bit more, what I
> really have is a tab delimited line where I want to check the given line to
> see if it matches my given pattern.  For example, the line  below would
> pass because of the disease_show/388////
> 
> 9444f850ff0c10a862b4a6a9c4ab0a74 disease_show disease_show/388//// 4.5
> 
> but this line would not
> 
> 9444f850ff0c10a862b4a6a9c4ab0a74 disease_show disease_show/388/description/// 4.5
> 
> What would you recommend the best way to go about this would be? it seems
> that I need to consume all the text/numbers, and tabs prior to the
> disease_show/388//// element.  If the given line does pass the test, I do
> some additional processing of the line, otherwise I skip it and move onto
> the next.....

  I'm assuming that the white space in those lines are tabs.  How to proceed
depends on how accurate you want the parsing to be.  Just off the top of my
head, something like:

dm = R("09","AF","af")^32^-32 
   * P"\t" 
   * P"disease_show" -- or R("AZ","az","__") if this can be other stuff
   * P"\t"
   * P"disease_show"
   * P"/"
   * R"09"^1
   * P"////"	-- or P'/'^0 if you like
   * P"\t"
   * P(1)^0 -- or a pattern that matches what looks like a real number

  I broke the statement out definition to make it easier to follow.  Some
productions can be merged or broken up as required.  You can also place
captures around the fields you are interested in, although if you have more
than one, you might want to use named captures within a table capture, for
instance:

dm = Ct(
		  R("09","AF","af")^32^-32
	   	* P"\t" 
	   	* P"disease_show" -- or R("AZ","az","__") if this can be other stuff
	   	* P"\t"   
   	* Cg(P"disease_show","title")
	   	* P"/"
   	* Cg(R"09"^1 / tonumber,"amount") -- or whatever the value represents
	   	* P"////"    -- or P'/'^0 if you like
	   	* P"\t"
	   	* P(1)^0 -- or a pattern that matches what looks like a real number
	)

will return nil of something doesn't match, and a table with two fields,
"title" (a string) and "amount" (converted to a number via tonumber()).  I
also formatted this to make it easier to see which fields are captured via a
name.

  -spc (LPeg really does make for a better "regex" engine than even regex)