lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great pocomane once stated:
> I am writing a parser for a very simple configuration language. 

  I've used Lua itself as the configuration language (for personal and work
related projects).  It's pretty easy to do:

	configuration = {}
	local f = loadfile("myconfigfile.txt","t",configuration)
	if f then
	  f()
	else
	  error()
	end

	if configuration.blah ...

  And if you need access to some data, you can always make it available in
the variable 'configuration':

	configuration = { HOME = os.getenv "HOME" }

then your configuration file can reference HOME.  (spoilers) If the input
comes in piecemeal, check out load(), which can deal with that situation. 
Just a thought.

> I know
> that LPEG is designed for this purpose, but since it is a very simple
> language, I would like to use the lua patterns only (please, do not
> judge me :) ).

  No judgement here, but be aware that LPeg has the same issues you are
encountering.

> I have a working parser that acts on a single string, but now I would
> like to extend it to get the input splitted among several chunks,
> without waiting for the whole data unless it is necessary.
> 
> We consider, for example, the pattern
> ^{%a+}
> to match against the following input:
> {foobar}
> We suppose it is splitted in two chunks right in the middle.
> 
> On the first chunk the match fails. But it may match if I wait for
> another chunk. How I can check for this? [1]
>
> [1] And, just to be clear, I want to immediately stop the parsing in
> other cases, e.g. when matchin against:
> fo}obar

  So, you are expecting 

	{foobar}

in the input.  The input

	{fo)obar}

is expected to be rejected.  But when starting, all you have is

	{fo

  Tough problem.  

> Obviously I can use another pattern. In the example, on a fail I can
> just check ^[^{] to know if I need another chunk. But in this way,
> each pattern in the application should be treated separately.
> 
> Is there a generic way to solve this issue? For example, if the lua
> API exposed the point where the matching stops on failure, I could
> easly know if I need to wait for another chunk. But that information
> is not avaiable (I think)...

  Reading up on Lua patterns, I don't think so.  And I don't think there's a
generic solution.  I know *of* a solution for LPeg but it requires knowing
up front that you might not have all the data at one go.  Something like:

	local lpeg = require "lpeg"
	local P , R , Cc , Cp = lpeg.P , lpeg.R , lpeg.Cc , lpeg.Cp

	local rest    = R"az"^1 * P"}"  * Cc'full-match' * Cp()
                      + R"az"^1 * P(-1) * Cc'more') * Cp()
	              + Cc'bad-match' * Cp()
	local pattern = P"{" * P(-1) * Cc('more') * Cp()
	              + P"{" * rest

	-- '*' is pattern AND pattern
	-- '+' is pattern OR pattern
	-- P"string" will match the given string literal;
	-- P(-1) will match end of string
	-- R"az"^1 matches the range "a" through "z" one or more times
	-- Cc(value) will return (as a capture) the given value
	-- Cp() will return the current position.

  So, if you call pattern:match(string), the following strings will return:

	{foobar}	-> 'full-match' 9
	{fo)bar}	-> 'bad-match' 4
	{fo		-> 'more' 4

if you get back 'more', then you need to follow up with

	rest:match(string,position)

which will tell you if you have a full-match, bad-match or it needs more
input.

  I'm not saying you have to use LPeg, but it is the tool I reach for when
parsing, because it's more flexible (but more complex) in dealing with
input.

  -spc