lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi Sean !

After reading my mail I think I wasn't clear enough.

How do you deal with situations where you have a matching end character in between your match ?

It particularly problematic for characters such as  " '

Is there a LPEG general best practice for such cases ?

best wishes.




On Tue, Sep 18, 2018 at 6:10 PM joy mondal <joykrishnamondal@gmail.com> wrote:
Hi Sean !

I am still unsure how to deal with situations such as these:

""Hello""

"Hel \"\" lo" -- using \ for escaping

I would like to parse these as (Lua Tables) :

{"string",""Hello""}

{"string","Hel \"\" lo"}

I see it would be similar to the indent logic you have illustrated above, but i  cannot seem to piece together the hints.

cheers !

On Sat, Sep 15, 2018 at 11:05 AM Sean Conner <sean@conman.org> wrote:
It was thus said that the Great joy mondal once stated:
> Hi Sean Conner,
>
> Thanks for the quick reply !
>
> You have answered both my questions
>
> ( I need to try and write the first example since I am still unsure about
> the LPEG code for capturing indentation - left curly and right curly are
> simple enough but I am thrown off regarding indentation ).

  Easy enough:

        indent = lpeg.P"\n" * lpeg.P" "^0

Or, if your gammar ends on a newline, then skip the initial lpeg.P"\n".  To
save the current level, you could do something like (untested):

        indent = lpeg.P"\n" * (lpeg.C(lpeg.P" "^0) * lpeg.Carg(1))
               / function(indent,info)
                   -- ------------------
                   -- if the length of indent is larger than the current
                   -- indent level, we have a new indent level
                   -- -----------------

                   if #indent > #info.identlevels[#info.identlevels] then
                     table.insert(info.identlevels,ident)
                     return "{" -- simulate an opening bracket or
                                -- whatever you use to indicate new indent

                   elseif #indent < #info.indentlevels[#info.identlevels] then
                     table.remove(info.identlevels)
                     return "}"

                   else
                     return " " -- just return something neutral
                   end
                 end



  This treats indent levels as either an opening brace, closing brace or
space (change to suite your needs).  But you do need to call the top level
parsing rule to:

        ast = parser:match(text,1,{ indentlevels = { "" } })

paramters to lpeg.match past the initial position argument are available via
lpeg.Carg(), and I'm using that here to keep track of some addtional
information during the parse (you could skip this and keep this info in
globals, but I dislike globals as much as possible).  Here, the indentlevels
array is just a stack of seen indents.  If we get an indent that is longer
than the current one, we have a new level, and if it's shorter, we've ended
a level, and if it matches, we're still in the current level.

  If you want to handle tabs as spaces, you can do it, but it can get
complicated.

> For the first issue, yes I need to create a hierarchical tree instead of a
> flat output, normally examples of lexer output online show a stream of
> tokens, but LPEG creates an AST directly.

  Oh, LPeg can create a stream of tokens---I've had to do that type of stuff
in certain circumstances.  It's not hard, but you do have to track a bit
more information:

        local parser = lpeg.C( --[[ LPeg code ]]-- ) * lpeg.Cp()

        local text = " ... code to parse here ... "
        local pos  = 1
        local info = { --[[ additional information used for parsing ]]-- }

        while pos <= #text do
          local token,newpos = parser:match(text,pos,info)
          if not token then
            error "Error parsing"
          end
          -- process token
          pos = newpos
        end

Basically, the parser bit will return the next logical token and the
position to resume parsing the text for the next token.

  -spc