lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On Wed, Mar 27, 2019 at 9:06 PM Soni "They/Them" L. <fakedme@gmail.com> wrote:


On 2019-03-27 10:10 p.m., Coda Highland wrote:
> On Wed, Mar 27, 2019 at 6:47 PM Soni "They/Them" L. <fakedme@gmail.com
> <mailto:fakedme@gmail.com>> wrote:
>
>     I'm rolling with this weird idea. is it possible to make a simple
>     parser
>     out of tables? I tried some stuff but ran into a few issues. I don't
>     wanna use LPeg or anything more complicated than string.find(foo,
>     bar,
>     1, true).
>
>     e.g.:
>
>     local ltk = {} -- lua_tokens
>     ltk.string = {}
>     ltk['"'] = "string"
>     ltk["'"] = "string" -- how to handle end of string correctly?
>     ltk.string['\\'] = "escape"
>     ltk.string.escape = {}
>     ltk.string.escape['z'] = {}
>     ltk.string.escape['z'].next = ltk.string.escape['z']
>     for i, v in ipairs({" ", "\t", "\n", etc}) do
>        ltk.string.escape['z'][v] = "next"
>     end
>     ltk.string.escape['z'][''] = ltk.string -- not sure if there'd be a
>     better way to do this
>     -- etc
>
>
> Not a bad start for someone who's never done this formally before.
>
> There are a zillion different ways to define a parser. The Dragon Book
> recommendation is well-given.
>
> I think the insight you're missing is that the table really represents
> the structure of a state machine. Your top-level table shouldn't
> contain things like ["'"] = "string" directly; rather, you'd want
> something like ltk.base["'"]. Then you wouldn't have
> ltk.string.escape, you'd just have ltk.string and ltk.escape. The
> nesting doesn't belong in the table itself; instead, your parser will
> maintain a stack of states, and you push a state when you start into a
> parsing rule and you pop it off when you've finished that rule.

Yeah but then I wouldn't have self-referential tables. Besides I wasn't
planning on having a stack. FSMs (same class as regex) don't have a stack.

FSMs can't handle parentheses. String escapes are about the limit of what they can manage.
 
>
> The place where this is suddenly going to get complicated is if your
> grammar contains any prefix ambiguities (e.g. "fork" vs "forward"),
> because then you have to implement backtracking. For most simpler
> grammars it's best to just break it into two passes -- one that makes
> simple lexical tokens out of a string of characters, and one that
> evaluates the syntax out of a list of lexical tokens.

"fork" vs "forward" is unambiguous and doesn't require backtracking.

Derp. That's my mistake. You're right, that one's good.

/s/ Adam