Meta Lua Abstract Syntax Tree

lua-users home
wiki

Abstract syntax trees are a convenient high-level way to represent programs. MetaLua, which deals a lot with analyzing generating and manipulating Lua programs, does so mostly by representing them as AST. This page describes the AST grammar currently supported by MetaLua. It is the result of a fairly long maturation and feedback process, and this format is used by several tools, such as [ZeroBrane] and [SciTE] through LuaInspect, [Eclipse's Lua Development Tools], or TypedLua?.

Notation

Trees are written below with some Metalua syntax sugar, which increases their readability. the backquote symbol introduces a `tag`, i.e. a string stored in the `"tag"` field of a table:

When using a Metalua interpreter or compiler, the backtick syntax is supported and can be used directly. Metalua's pretty-printing helpers also try to use backtick syntax whenever applicable.

Tree elements

Tree elements are mainly categorized into statements stat, expressions expr and lists of statements block. Auxiliary definitions include function applications/method invocation apply, are both valid statements and expressions, expressions admissible on the left-hand-side of an assignment statement lhs.

    block: { stat* }

    stat:
      `Do{ stat* }
    | `Set{ {lhs+} {expr+} }                    -- lhs1, lhs2... = e1, e2...
    | `While{ expr block }                      -- while e do b end
    | `Repeat{ block expr }                     -- repeat b until e
    | `If{ (expr block)+ block? }               -- if e1 then b1 [elseif e2 then b2] ... [else bn] end
    | `Fornum{ ident expr expr expr? block }    -- for ident = e, e[, e] do b end
    | `Forin{ {ident+} {expr+} block }          -- for i1, i2... in e1, e2... do b end
    | `Local{ {ident+} {expr+}? }               -- local i1, i2... = e1, e2...
    | `Localrec{ ident expr }                   -- only used for 'local function'
    | `Goto{ <string> }                         -- goto str
    | `Label{ <string> }                        -- ::str::
    | `Return{ <expr*> }                        -- return e1, e2...
    | `Break                                    -- break
    | apply

    expr:
      `Nil  |  `Dots  |  `True  |  `False
    | `Number{ <number> }
    | `String{ <string> }
    | `Function{ { ident* `Dots? } block }
    | `Table{ ( `Pair{ expr expr } | expr )* }
    | `Op{ opid expr expr? }
    | `Paren{ expr }       -- significant to cut multiple values returns
    | apply
    | lhs

    apply:
      `Call{ expr expr* }
    | `Invoke{ expr `String{ <string> } expr* }

    ident: `Id{ <string> }

    lhs: ident | `Index{ expr expr }

    opid: 'add'   | 'sub'   | 'mul'   | 'div'
        | 'mod'   | 'pow'   | 'concat'| 'eq'
        | 'lt'    | 'le'    | 'and'   | 'or'
        | 'not'   | 'len'

Meta-data (lineinfo)

ASTs also embed some metadata, allowing to map them to their source representation. Those informations are stored in a "lineinfo" field in each tree node, which points to the range of characters in the source string which represents it, and to the content of any comment that would appear immediately before or after that node.

Lineinfo objects have two fields, "first" and "last", describing respectively the beginning and the end of the subtree in the sources. For instance, the sub-node `Number{123} produced by parsing [[return 123]] will have lineinfo.first describing offset 8, and lineinfo.last describing offset 10:

    > mlc = require 'metalua.compiler'.new()
    > ast = mlc :src_to_ast "return 123 -- comment"
    > print(ast[1][1].lineinfo)
    <?|L1|C8-10|K8-10|C>
    >

A lineinfo keeps track of character offsets relative to the beginning of the source string/file ("K8-10" above), line numbers (L1 above; a lineinfo spanning on several lines would read something like "L1-10"), columns i.e. offset within the line ("C8-10" above), and a filename if available (the "?" mark above indicating that we have no file name, as the AST comes from a string). The final "|C>" indicates that there's a comment immediately after the node; an initial "<C|" would have meant that there was a comment immediately before the node.

Positions represent either the end of a token and the beginning of an inter-token space ("last" fields) or the beginning of a token, and the end of an inter-token space ("first" fields). Inter-token spaces might be empty. They can also contain comments, which might be useful to link with surrounding tokens and AST subtrees.

Positions are chained with their "dual" one: a position at the beginning of and inter-token space keeps a reference to the position at the end of that inter-token space in its "facing" field, and conversely, end-of-inter-token positions keep track of the inter-token space beginning, also in "facing". An inter-token space can be empty, e.g. in "2+2", in which case lineinfo==lineinfo.facing.

Comments are also kept in the "comments" field. If present, this field contains a list of comments, with a "lineinfo" field describing the span between the first and last comment. Each comment is represented by a list of one string, with a "lineinfo" describing the span of this comment only. Consecutive lines of -- comments are considered as one comment: "-- foo\n-- bar\n" parses as one comment whose text is "foo\nbar", whereas "-- foo\n\n-- bar\n" parses as two comments "foo" and "bar".

So for instance, if f is the AST of a function and I want to retrieve the comment before the function, I'd do:

    f_comment = f.lineinfo.first.comments[1][1]

The informations in lineinfo positions, i.e. in each "first" and "last" field, are held in the following fields:


RecentChanges · preferences
edit · history
Last edited January 31, 2014 5:12 pm GMT (diff)