I'm rolling with this weird idea. is it possible to make a simple parser
out of tables? I tried some stuff but ran into a few issues. I don't
wanna use LPeg or anything more complicated than string.find(foo, bar,
1, true).
e.g.:
local ltk = {} -- lua_tokens
ltk.string = {}
ltk['"'] = "string"
ltk["'"] = "string" -- how to handle end of string correctly?
ltk.string['\\'] = "escape"
ltk.string.escape = {}
ltk.string.escape['z'] = {}
ltk.string.escape['z'].next = ltk.string.escape['z']
for i, v in ipairs({" ", "\t", "\n", etc}) do
ltk.string.escape['z'][v] = "next"
end
ltk.string.escape['z'][''] = ltk.string -- not sure if there'd be a
better way to do this
-- etc
Not a bad start for someone who's never done this formally before.
There are a zillion different ways to define a parser. The Dragon Book recommendation is well-given.
I think the insight you're missing is that the table really represents the structure of a state machine. Your top-level table shouldn't contain things like ["'"] = "string" directly; rather, you'd want something like ltk.base["'"]. Then you wouldn't have ltk.string.escape, you'd just have ltk.string and ltk.escape. The nesting doesn't belong in the table itself; instead, your parser will maintain a stack of states, and you push a state when you start into a parsing rule and you pop it off when you've finished that rule.
The place where this is suddenly going to get complicated is if your grammar contains any prefix ambiguities (e.g. "fork" vs "forward"), because then you have to implement backtracking. For most simpler grammars it's best to just break it into two passes -- one that makes simple lexical tokens out of a string of characters, and one that evaluates the syntax out of a list of lexical tokens.
I don't mind offering assistance with this. I've actually built a complete standards-compliant parser for C++ from first principles before. I think it's pretty fun.
/s/ Adam