[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Python AST parser
- From: Don Hopkins <dhopkins@...>
- Date: Thu, 21 Sep 2006 02:20:22 -0700
ScriptX was a Dylan-like language that had a simple more traditional
syntax (kind of like C, but with a few quirks and differences).
It had a parser that created parse trees (abstract syntax trees), a
macro system that operated on parse trees, and a compiler that compiled
the parse trees into byte code.
Dan Borenstein wrote an alternative parser for ScriptX: a scheme syntax
front-end for ScriptX (which was very similar to Scheme internally).
Dylan also has both a lispy s-expression-istic and an alternative
Python takes a similar approach, but the parse trees that the parser
produced were very low level and did not make a lot of sense for macros
to process (it included many extremely uninteresting intermediate nodes
in the BNF that the parser had to travel through before it got to the
part of the syntax of each expression, has many special cases, and
dealing with it requires a lot of interpretation and simplification).
But the latest version of Python has a totally new rewritten AST parser
that I hope is more macro-programmer-friendly! I haven't had a chance to
look at the code or play around with it yet, but it sounds like quite a
Access to the AST is useful not only for compiling an alternative text
syntax, but also for a visual programming language! It makes it easier
to parse a text expression and edit it visually, and compile visual code
directly without going through the text syntax.
- A new AST parser implementation was completed. The abstract
syntax tree is available for read-only (non-compile) access
to Python code; an _ast module was added.
The design of the bytecode compiler has changed a great deal, to no
longer generate bytecode by traversing the parse tree. Instead the parse
tree is converted to an abstract syntax tree (or AST), and it is the
abstract syntax tree that's traversed to produce the bytecode.
It's possible for Python code to obtain AST objects by using the
compile() built-in and specifying |_ast.PyCF_ONLY_AST| as the value of
the flags parameter:
from _ast import PyCF_ONLY_AST
ast = compile("""a=0
for i in range(10):
a += i
""", "<string>", 'exec', PyCF_ONLY_AST)
assignment = ast.body
for_loop = ast.body
No documentation has been written for the AST code yet. To start
learning about it, read the definition of the various AST nodes in
Parser/Python.asdl. A Python script reads this file and generates a set
of C structure definitions in Include/Python-ast.h. The
PyParser_ASTFromString() and PyParser_ASTFromFile(), defined in
Include/pythonrun.h, take Python source as input and return the root of
an AST representing the contents. This AST can then be turned into a
code object by PyAST_Compile(). For more information, read the source
code, and then ask questions on python-dev.
The AST code was developed under Jeremy Hylton's management, and
implemented by (in alphabetical order) Brett Cannon, Nick Coghlan, Grant
Edwards, John Ehresman, Kurt Kaiser, Neal Norwitz, Tim Peters, Armin
Rigo, and Neil Schemenauer, plus the participants in a number of AST
sprints at conferences such as PyCon.
The AST branch lands
Last week, a historic event happened in the Python source tree: the AST
branch was merged into the trunk. I've only followed this work from
python-dev e-mails, and with only half my brain, so take the following
notes with a grain of salt; I've probably made some silly errors.
To start, what's the AST branch? In all previous versions of CPython,
Python source code is parsed and turned into a parse tree. Code then
loops over this parse tree to generate Python bytecode. The problem with
this design is that it's very difficult to make modifications or
analyses of the parse tree, because the parse tree records details that
aren't relevant from the point of view of the code's semantics. For
example, the parse trees for these two statements are different, even
though they implement the same computation: [...]
AST stands for Abstract Syntax Tree. The AST more closely matches the
semantics of Python and cleans up special cases. [...]
The primary benefit is that it should now be easier to write
optimization passes and tools such as PyChecker or refactoring browsers,
particularly once there's a Python interface to the AST and once
everything is documented. There's also some hope that the AST interface
can be shared across Python implementations such as Jython and
IronPython. We'll see where things go from here.