lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 2019-03-28 10:21 a.m., Coda Highland wrote:


On Wed, Mar 27, 2019 at 9:06 PM Soni "They/Them" L. <fakedme@gmail.com <mailto:fakedme@gmail.com>> wrote:



    On 2019-03-27 10:10 p.m., Coda Highland wrote:
    > On Wed, Mar 27, 2019 at 6:47 PM Soni "They/Them" L.
    <fakedme@gmail.com <mailto:fakedme@gmail.com>
    > <mailto:fakedme@gmail.com <mailto:fakedme@gmail.com>>> wrote:
    >
    >     I'm rolling with this weird idea. is it possible to make a
    simple
    >     parser
    >     out of tables? I tried some stuff but ran into a few issues.
    I don't
    >     wanna use LPeg or anything more complicated than
    string.find(foo,
    >     bar,
    >     1, true).
    >
    >     e.g.:
    >
    >     local ltk = {} -- lua_tokens
    >     ltk.string = {}
    >     ltk['"'] = "string"
    >     ltk["'"] = "string" -- how to handle end of string correctly?
    >     ltk.string['\\'] = "escape"
    >     ltk.string.escape = {}
    >     ltk.string.escape['z'] = {}
    >     ltk.string.escape['z'].next = ltk.string.escape['z']
    >     for i, v in ipairs({" ", "\t", "\n", etc}) do
    >        ltk.string.escape['z'][v] = "next"
    >     end
    >     ltk.string.escape['z'][''] = ltk.string -- not sure if
    there'd be a
    >     better way to do this
    >     -- etc
    >
    >
    > Not a bad start for someone who's never done this formally before.
    >
    > There are a zillion different ways to define a parser. The
    Dragon Book
    > recommendation is well-given.
    >
    > I think the insight you're missing is that the table really
    represents
    > the structure of a state machine. Your top-level table shouldn't
    > contain things like ["'"] = "string" directly; rather, you'd want
    > something like ltk.base["'"]. Then you wouldn't have
    > ltk.string.escape, you'd just have ltk.string and ltk.escape. The
    > nesting doesn't belong in the table itself; instead, your parser
    will
    > maintain a stack of states, and you push a state when you start
    into a
    > parsing rule and you pop it off when you've finished that rule.

    Yeah but then I wouldn't have self-referential tables. Besides I
    wasn't
    planning on having a stack. FSMs (same class as regex) don't have
    a stack.


FSMs can't handle parentheses. String escapes are about the limit of what they can manage.

FSMs can't handle an arbitrary number of parentheses. But you can very much create an FSM with a nesting depth limit. It's just gonna be... a *little* big...

    >
    > The place where this is suddenly going to get complicated is if
    your
    > grammar contains any prefix ambiguities (e.g. "fork" vs "forward"),
    > because then you have to implement backtracking. For most simpler
    > grammars it's best to just break it into two passes -- one that
    makes
    > simple lexical tokens out of a string of characters, and one that
    > evaluates the syntax out of a list of lexical tokens.

    "fork" vs "forward" is unambiguous and doesn't require backtracking.


Derp. That's my mistake. You're right, that one's good.

/s/ Adam