Sven Olsen

wiki

I started playing around with Lua in 2010; about a year before the release of 5.2. I quickly fell in love with the language, though a few of the idioms struck me as needlessly verbose. And so it wasn’t long before I was poking around the power patch page, looking for a += implementation.

What I found there was something far better -- Peter Shook’s beautifully clever table unpack patch. Since that discovery, tweaking the syntax rules of my Lua parser has become a bit of a guilty pleasure.

Rather than clutter up the power patch page with a long list of small, debatably useful patches, I’ve decided to try documenting most of them here. Again -- I’m following PeterShook's lead in this; as he also seemed to have decided to move the docs for some of his more debatably useful language tweaks to his personal bio page.

Newline Handling

  a=b
  (f or g)()

In Lua 5.1, the parser will throw an error given the above code, complaining about “ambiguous syntax”. Lua 5.2 will accept the code, interpreting the two lines as single statement, one that executes:

  a=b( f or g)()

In general, I prefer 5.1’s behavior. After upgrading to Lua 5.2, I occasionally found myself writing bugs that the old “ambiguous syntax” check would have caught.

However, as Roberto has pointed out, there are problems with 5.1’s “ambiguous syntax” check. For one, it doesn’t actually check for ambiguous syntax -- a task that would be effectively impossible inside a single-pass parser. Instead, the check is implemented by simply throwing an error anytime a function argument list starts on a new line. Thus, under 5.1

  print 
  (
   "long string one",
   "long string two"
  )

results in an error for “ambiguous syntax”; though there’s clearly nothing ambiguous about the code.

I’ve tweaked my own Lua parser to have behavior somewhere between Lua 5.1’s and 5.2’s. My check works by adding a second condition to the one used in 5.1, restricting errors to the case of expressions that include 2+ function calls. I’ve also changed the text of the error message, in hopes of making it more obvious that the error should be interpreted as a warning about dangerous formating.

My modified check isn’t perfect -- like Lua 5.1, it will still sometimes throw errors in response to code that only has one possible interpretation. For example, the following triggers an error, even though there’s only one valid way of parsing the code:

  new_object
  (f or g)(state)

The beautiful thing about Lua’s syntax is that these sorts of troublesome syntax ambiguities very rarely come up in practice. Even under the very aggressive 5.1-style newline handling, programmers would only rarely see “ambiguous syntax” errors. Under my own more cautious check, such situations are even less common.

But while it may be a rare edge case, I think it’s a mistake to ignore the issue completely.

Of all the patches I’ve written, this is the only one which I’d seriously recommend as an addition to the official Lua branch. It does prevent bugs; and it’s costs are tiny.

[Download for 5.2.2)]

Optional "do" and "then" tokens

This is perhaps the simplest powerpatch you’ll ever find. It’s a one liner that Brian Palmer included in his concise anonymous functions patch -- one that removes the need to follow ‘if’ statements with a ‘then’ token. I’ve extended it to similarly make optional following a ‘for’ statement with a ‘do’ token. Using the patch will add some potential parsing quirks to the language. For example, if you have a statement like this:

  if a then
    (f or g)()
  end

and you remove the “then”, you’ll end up generating code that simply executes the line:

  a( f or g )()

But as discussed above -- potentially ambiguous syntax is something that Lua programmers will always need to be mindful of -- it’s a necessary consequence of making semicolons optional. In the version posted here, I’ve packaged it with my newline handling patch; as using it along with 5.2’s relaxed parse rules seems unnecessarily dangerous.

[Download for 5.2.2)]

Pipes

This is a simple piece of syntax sugar inspired by the unix command line. It converts:

	print | a+b ==>  print(a+b)

The operator '|' has the lowest possible priority.

A related transform allows a '|' to be used in place of the last argument to a function. For example:

	f(x,y,|) { <complex table definition> } ==> f(x,y, { <complex table definition> ) }

For-Loops with ';'

There has been much debate on lua-l about how to best write a shorthand that can simplify the idiomatic 'for ... in pairs(...) do' statement.

The piece of syntax sugar I prefer is to simply transform:

	for k,v ; t ==> for k,v in pairs*(t) }

The 'pairs' call here has an asterisk, as it's evaluated using the version of pairs in the global table, rather than _ENV.pairs, as would otherwise be the case. Thus, the shorthand will behave more or less as expected even if you replace _ENV with something odd.

Bigger Patches

If you're curious to try out any of the rest of my patches, you can download them all as one [super-patch], based on 5.2.2. While most of my mods are fairly self-contained -- there's just enough overlap between them to make maintaining independent patch files troublesome. Peter's Table Unpack patch overlaps with both Compound Assignment and the Required Fields semantic. Required Fields, meanwhile, share a VM change with the Safe Navigation patch. The Stringification patches are both small and self-contained -- though if you want a clean-version of those two, you can find my walk-through of the diffs in the lua-l archives [1].

Table unpack

As I’ve mentioned above, this is my favorite powerpatch. And if you’re going to try any of my own syntax mods, you should certainly try Peter’s as well. The syntax converts:

	a,b,c in t  ==>  a,b,c=t.a,t.b,t.c.

It’s a wonderful transformation -- and one that's become even more useful as a result of the new _ENV rules in 5.2. For example, if you’re planning to change _ENV to something unusual, but want to keep certain standard global functions in scope, you simply write:

  local pairs, ipairs, tostring, print in _ENV

However, there are more subtle idioms you can try as well, most of which come from combining the syntax with metamethods. Consider:

 local x,y,z,vx,vy,vz in INIT(0)

If INIT(a) returns a table with ‘__index = function() return a end’, then the above will initialize all the given variables to 0.

A similar __index trick will let you significantly simplify most require statement boilerplate. For example, you can define an REQUIRE proxy object that lets you replace,

  local socket = require 'socket'
  local lxp = require 'lxp'
  local ml = require 'ml'

with,

 local socket, lxp, ml in REQUIRE

Stringification

Peter’s syntax is as powerful as it is because it gives programmers a tool for converting variable names into strings. However, it only does so inside the context of variable assignment. A more general tool for transforming variables into their matching string representations would be helpful; and while the patch I’ve come up with isn’t as elegant or as clear as Peter’s, I do often use it.

The patch applies two transformations. First, in the context of a table constructor, writing

	t = {..star, ..planet, ..galaxy} ==>  t = {star=star, planet=planet, galaxy=galaxy}

Similarly, in the context of function argument lists, writing

	f(..star,..planet,..galaxy) ==> f('star',star,'planet',planet,'galaxy',galaxy)

Due to a quirk of the implementation, using ‘..’ on a on a complex expression will return the last string, name, or numeric constant encountered while parsing that expression. Thus

	{..planet.star, ..planet, ..moon 'luna' } ==> 
		{ star=planet.star, planet=planet, luna = moon 'luna' }

Safe Navigation

The name for this handy semantic is borrowed from Groovy; through CoffeeScript? also has a similar feature. The idea is to make it possible to check for values without triggering an 'attempt to index nil' error.

For example, the following expression will evaluate to the glow color of an object’s icon, if a glow color is defined, or white, otherwise:

  color = object?.icon?.glow?.color or white

Failing to define object, or object.icon, or object.icon.glow, will result in the first part of the expression evaluating to nil.

When I proposed this patch on lua-l [2], there was quite a bit of enthusiasm for it. However, there was also some disagreement over the details of how, exactly, the semantics ought to work.

My own preference is to define ‘?’ as a relatively simple piece of syntax sugar, though one that relies on adding a new variable to the global namespace. Thus, when a new lua state is initialized, I add a userdata called _SAFE to the global table, where _SAFE has __index, __newindex, and __call all set to nullops, __len set to always return 0, and __pairs and __ipairs defined as functions that themselves return a nullop.

Once we have such a user data, it's fairly simple to add some syntax sugar that converts

	(<expr>?) ==> (<expr> or _SAFE*)

Given the default definition of _SAFE , this results in indexing operations that work as desired. But the semantic also opens up some neat new features. For example, if you’re also using Peter’s table unpack patch, you can write:

  local update, display in object?

With the result that update and display will be nil if object isn’t defined.

Similarly, you can call

  update?()

with the result that update will only execute if it’s defined. Or you can write the following, which which iterates through all of object’s icons, provided they’re defined.

  for k,v in pairs(object.icons?)

But there’s one caveat to this otherwise fairly elegant definition. For the patch to work as intended, the version of ‘_SAFE’ referenced from the syntax sugar needs to be evaluated as if the upvalue _ENV was equal to _G -- otherwise, a seemingly harmless line like _ENV = {} will change the meaning of the shorthand. (This is why I’ve included an asterix in the transformation definition.)

So while implementing this patch requires only a small parser change, it also requires reintroducing the op code OP_GETGLOBAL back into the VM.

It’s probably also worth pointing out that the semantics do have a bit of quirk, one related to the way ‘or’ operations are interpreted. Specifically:

  v = (nil)?.v ==> v=nil
  v = (false)?.v ==> v=nil
  v = (true)?.v ==> runtime error: attempt to index a boolean value

Several lua-l users considered that behavior a bit unintuitive; though personally, I prefer it to throwing an error on (false)?.v.

Required Fields

A few months after writing up the safe navigation patch -- there was a long discussion on lua-l about the various ways in which having undefined table references return nil can lead to bugs [3]. For example, if you’re writing code to build a list of all object names, iterating along and setting

  name[i] = object.name

can lead to strange behaviors on down the line if you happen to come across an object that lacks a name.

In a way, this is the exact opposite of the situation that motivated the safe navigation patch. The purpose of ‘?’ is to make Lua return nil when it would otherwise throw an error. However, there are certainly cases where the opposite behavior is desirable; where we’d like Lua to throw an error, rather than returning nil. Thus, I’ve written a patch for a "required field" operator. On the surface, it's quite similar to my safe navigation patch. In its simplest form, it converts:

	(object!) ==> (object or _MISSING*( "object" ) )

Here _MISSING is a global variable that's fetched via OP_GETGLOBAL as per the Safe Navigation patch. Thus, a line like

	name = get_name(object!,field!)

Will result in

	error: missing required value "object",  if object==nil or
	error: missing required value "field",  if field==nil

I've taken things a bit farther, however, augmenting the syntax so that table lookups that include a '!' will throw errors if they return nil or false. Thus, for a line like:

	age,height in record![name]!

The errors that may be generated include:

	error: missing required value "record", if record==nil
	error: <expr> is missing required field ["age"],  if record[name].age==nil
	error: <expr> is missing required field [name -> "foo"], if name=='foo', and record.foo==nil

This is not an elegant patch -- dispatching between the various error cases gets messy, and the generated bytecode is not terribly efficient. Even so, I've been finding it useful. It removes the need for much of the boilerplate sanity checking code I’d otherwise write.

Compound Assignment

This was the syntax sugar I originally went looking for. All the goodies I'd been missing from C: +=, -=, *=, etc.

The implementation included in the bundle allows vector increments, for example:

  vx,vy,vz += ax, ay, az

You can also use an open function call to provide data for an arbitrary list of additional values. For example:

  px,py,pz += 1, calc_yz_vel()

However, if there's a clear miss-match between the number of right hand and left hand values, the parser will throw an error.

Unlike standard assignments, the compound assignments are evaluated left-to-right. So,

  local a,b=2,4
  a,b+=b,a
    ==> a==6, b==10

Also -- because I was having altogether too much fun with this hack -- I added a '++' sugar as well. In the case of multiple left hand values, all will be incremented by 1.

	i,j,k++    ==>    i,j,k=i+1,j+1,k+1

RecentChanges · preferences
edit · history
Last edited March 23, 2015 3:15 pm GMT (diff)