[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Pattern matching among multiple chunks
- From: Sean Conner <sean@...>
- Date: Wed, 16 Jan 2019 14:02:09 -0500
It was thus said that the Great pocomane once stated:
> I am writing a parser for a very simple configuration language.
I've used Lua itself as the configuration language (for personal and work
related projects). It's pretty easy to do:
configuration = {}
local f = loadfile("myconfigfile.txt","t",configuration)
if f then
f()
else
error()
end
if configuration.blah ...
And if you need access to some data, you can always make it available in
the variable 'configuration':
configuration = { HOME = os.getenv "HOME" }
then your configuration file can reference HOME. (spoilers) If the input
comes in piecemeal, check out load(), which can deal with that situation.
Just a thought.
> I know
> that LPEG is designed for this purpose, but since it is a very simple
> language, I would like to use the lua patterns only (please, do not
> judge me :) ).
No judgement here, but be aware that LPeg has the same issues you are
encountering.
> I have a working parser that acts on a single string, but now I would
> like to extend it to get the input splitted among several chunks,
> without waiting for the whole data unless it is necessary.
>
> We consider, for example, the pattern
> ^{%a+}
> to match against the following input:
> {foobar}
> We suppose it is splitted in two chunks right in the middle.
>
> On the first chunk the match fails. But it may match if I wait for
> another chunk. How I can check for this? [1]
>
> [1] And, just to be clear, I want to immediately stop the parsing in
> other cases, e.g. when matchin against:
> fo}obar
So, you are expecting
{foobar}
in the input. The input
{fo)obar}
is expected to be rejected. But when starting, all you have is
{fo
Tough problem.
> Obviously I can use another pattern. In the example, on a fail I can
> just check ^[^{] to know if I need another chunk. But in this way,
> each pattern in the application should be treated separately.
>
> Is there a generic way to solve this issue? For example, if the lua
> API exposed the point where the matching stops on failure, I could
> easly know if I need to wait for another chunk. But that information
> is not avaiable (I think)...
Reading up on Lua patterns, I don't think so. And I don't think there's a
generic solution. I know *of* a solution for LPeg but it requires knowing
up front that you might not have all the data at one go. Something like:
local lpeg = require "lpeg"
local P , R , Cc , Cp = lpeg.P , lpeg.R , lpeg.Cc , lpeg.Cp
local rest = R"az"^1 * P"}" * Cc'full-match' * Cp()
+ R"az"^1 * P(-1) * Cc'more') * Cp()
+ Cc'bad-match' * Cp()
local pattern = P"{" * P(-1) * Cc('more') * Cp()
+ P"{" * rest
-- '*' is pattern AND pattern
-- '+' is pattern OR pattern
-- P"string" will match the given string literal;
-- P(-1) will match end of string
-- R"az"^1 matches the range "a" through "z" one or more times
-- Cc(value) will return (as a capture) the given value
-- Cp() will return the current position.
So, if you call pattern:match(string), the following strings will return:
{foobar} -> 'full-match' 9
{fo)bar} -> 'bad-match' 4
{fo -> 'more' 4
if you get back 'more', then you need to follow up with
rest:match(string,position)
which will tell you if you have a full-match, bad-match or it needs more
input.
I'm not saying you have to use LPeg, but it is the tool I reach for when
parsing, because it's more flexible (but more complex) in dealing with
input.
-spc