On 2/23/2023 2:06 PM, Johann ''Myrkraverk'' Oskarsson wrote:
On 2/23/2023 2:36 AM, Sean Conner wrote:
It was thus said that the Great Johann ''Myrkraverk'' Oskarsson once stated:

Parsing is something I did with a line by line reader, and
string.match(), which works fine for a very rigid syntax.  Later
on, I'm pretty sure I'll have to resort to something more flexible
like recursive descent or something.

   I used LPEG for my assembler, only because parsing expressions like:


a lot easier.  It also helped with index-based addressing modes.  But even
with LPEG, I still do line-by-line parsing.  The only portion of my
assembler that is recursive is my "include" directive, which to me, is a
"nice to have" rather than a "hard requirement."

Nice, I didn't know about LPEG.  Depending on exactly what I end up
doing I may decide to do recursive descent, or something else.  My
personal end goal is a learning experience, so I'll definitely put
LPEG in my toolbox, but it's not necessarily the tool I end up using
for my projects.

That said, experimenting with it is probably a worthwhile endeavor for
said learning experience.

Ok, so I am trying out LPEG.  For somewhat rigid syntax, I came up with

   local p = { } -- the patterns, all in a table = lpeg.S( " \t\n\r" ) ^ 1 -- whitespace
   p.label = lpeg.Cg( lpeg.R( "az" ) ^ 0, "lbl" ) *
   p.instruction = ( lpeg.R( "az" ) + lpeg.R( "AZ" ) ) ^ 1
   p.operand = lpeg.Cg( lpeg.R( "09" ) ^ 1, "imm" )
   p.line = lpeg.Ct(  p.label
          * lpeg.Cg( p.instruction, "ins" ) * lpeg.S( " \t\n\r" ) ^ 0
          * p.operand ) -- lbl, ins, op; in that order.

[I hope the spacing comes out OK; there was a discussion earlier about
issues with Thunderbird, which I'm using.]

And this creates a table for each line that I print for now with
inspect.lua that I found somewhere on github.

For lines that don't have a label, it creates a key with the empty
string.  That is, the table will be something like

    imm = "35",
    ins = "store",
    lbl = ""

Is there a way for LPEG to return nil in that case instead?  Or is
this something I'll always have to post-process and check for an
empty string by hand?