lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hello,

I've read many of the discussions about 'switch' statements, the 'continue' statement and 'computed goto' in the mailing list as well as in the wiki.
I have programmed my own 'continue' statement for Lua 5.2-alpha just to
get a feeling for the parser though there is a patch for an earlier version of Lua available in the WIKI. I won't propose my implementation here (though its fully functional), because I think its a matter of taste if this construct is really necessary in the language or just some avoidable blow up. I think I would appreciate general gotos and labels in the language as well as line number annotations just for the purpose to be able to build something like Metalua which outputs direct Lua code instead of using its own bytecode-emitter with all the maintainance issues and compatibility problems (Metalua doesn't work with Luajit), but I understand, that Luajit has effiency rationales against such a general goto-construct. I agree that new language constructs must be carefully selected.

But still there is one kind of construct, which I find quite usefull and therefore I just want to ask the community and especially the core Lua team, whether this construct might be valuable or not for a later Lua release. I call it jumptable and it has something in common with a computed goto (without labels) as well as of a switch statement (reduced basically to integers). My point of view is that a switch statement would be pointless (just syntactic sugar) if it is internally mapped to an if-elseif chain. But often there is a need for state machines or code selection based on integers or string characters (-> string.format) which leads to very compact data coding. I know that jumps on e.g. integers can be realized by function tables in O(1) time, but for inner loops and especially when it seems to be natural to deal with common local variables in the different
"jump targets", functions have some overhead and are also a bit clumpsy.
I attached a readme file about my rationale as .txt file and also a zipped patch for Lua 5.2-alpha, which demonstrates this new language construct with some (admittedly) silly examples just to give some ideas. Note that this patch seems to work without restrictions but is only roughly testet by me. It serves just to explain the idea.

Discussion welcome ...

Regards,

Michael Rose


Here is some code snipped, which demonstrates the basics:
=========================================================

-- File: tst_format.lua
-- One example for jump tables in format issues.

local format = "We have  %d%% off for %s in %s!\n"
local args   = {10,"butter","our nice grocery"}

function my_format(p,fmt,...)
   local args = {...}
   local start,i,n,argi = 1,1,#fmt,0
   while i < n do
      if fmt:sub(i,i) == "%" then
         if start < i then p(fmt:sub(start,i-1)) end
         start = i + 2
         i     = i + 1
         -- NEW JUMP TABLE CONSTRUCT:
         jump val = fmt:byte(i) in "sSdDcC" do
            "dD" do args[argi+1] = tostring(args[argi+1]) end
            "sS" do argi = argi+1; p(args[argi]) break end
            "cC" do argi = argi+1; p(args[argi]:sub(1,1)) break end
            -- add other values...
            do p(string.char(val)) end  -- default target
         end
         -- END OF NEW JUMP TABLE CONSTRUCT.
      end
      i = i + 1
   end
   if start <= i then p(fmt:sub(start,i)) end
end

my_format(io.write, format, table.unpack(args))

-- End of File: tst_format.lua
This is a proposal for a jump table construct, which could be
included into Lua.

A roughly tested patch for 'lua-5.2.0-alpha.tar.gz'
is given as attachment 'lua-5.2.0-alpha-jumptable.patch'.
No warranty is given for correctness! Use at your own risk.
In addition this patch inherits the license of Lua.

The current jump table syntax is given by

jump       ::= 'jump' name '=' init 'in' rng (',' rng)... 'do' entry... 'end'
entry      ::= entrylist 'do' block 'end'
             | default
default    ::= 'do' block 'end'
entrylist  ::= ( entryrange ',' )... entryrange
entryrange ::= ( item '..' )... item
init       ::= int_expression
rng,item   ::= int_literal | string_literal

Example (with "i" as index into string "str"):

jump chr = str:byte(i) in 0,255 do
  "a".."z" do print("lower") break end
  "A".."Z" do print("higher") break end
  32,"_-:" do print("delimiter") break end
  0 .. 31, 127 .. 255 do print("unprintable") break end
  do print("other") end
end

Use the 'luac -l <name>.lua' method to get insight in the implementation
with the corresponding opcode listing.

* Each "int_literal" represents the corresponding number.
* Each "string_literal" represants the ordinal ascii numbers of its
  characters.
* Ranges of items are represented by the ".." operator.
* Items and ranges of items can be handled by a common block when they
  are combined by "," or ";" keyword.
* A bare "do" block "end" represents a unique default action.
* The whole jump table responds to the break statement.
* A break statement at the end of each entry block must be used to
  avoid fall through (like the C switch statement).
* The jump table is just as large that the minimum and maximum numbers
  specified by 'rng' values are allowable entry items.

Rationale:
==========

* This construct is different than a switch statement, which has been
  heavily discussed in the mailing list in the past.
* It is strictly and fast compiled as a one pass construct fitting nicely
  into the general compiler construction of Lua.
* The ranges "low" and "high" are needed to specify the ranges and
  the size of the table in the opcode list. The table consists of
  1 + (high - low + 1) jump opcodes (the first jump corresponds to
  the default action. The jump table is constructed first and is
  patched correctly while the entry items are parsed.
* A new opcode OP_JUMPTAB has been introduced to perform the indirect
  jump in the virtual machine.
* "init" must be a Lua number. All numerical values are excepted,
  but fractional values, values out of range and values without entry items
  are mapped to the default action.
* If the default action is not present, it is equivalent to a
  break statement, leaving the whole construct.
* The jump table need not be fully populated, the default action is
  patched to every entry, which is not defined. This allows sparse
  jump tables.
* Each entry item must be in the range of the table, otherwise
  an error is thrown at compile time.

* Jump tables are faster than if..elseif..elseif.. constructs. They are
  most useful in combinatin with integers. Ascii codes are very convenient
  in certain applications, therefore they are included in the proposal.
* All other uses of switch like statements can be represented by
  the usual if..elseif.. constructs, because this essentially corresponds
  to the compiled code if jump tables cannot be used.
* Jump tables have advantages in comparison to function tables, because
  the jump table construct avoids closure creation and can use
  all local variables of the lexical environment. This was the main
  rational to investigate it. If the functions from a function table
  needs some common upvalues from the local environment, their closures
  have to be constructed in the inner loop of such constructs.
  Jump tables are also very compact in the final code size.

Some things which should be discussed:
======================================

* For me the fall through similar to the C switch statement can be very
  useful in certain applications.
  But to avoid pitfalls, the syntax could be changed, so that the normal
  case is no fall through (like an implicit break statement at the end
  of each entry block). An explicit fall through could then be
  realized by an additional keyword like
    entry ... do ... fallthrough end
    entry ...
* The keywords are arbitrary and can be changed to anything useful
  if this patch is eventually accepted by the community.
  I tried to use as much of already available keywords as made sense to me.
* Strings are not accepted as "init" value. This could be changed
  eventually (e.g. the first ascii ordinarial would be used there as
  integer value). I have not done it, because I need to subtract
  an offset from the "init" value before the new opcode takes control.
  This was done to avoid additional opcodes etc.
  A more general implementation might as well map all other values
  to the default clause.

Michael Rose

Attachment: lua-5.2.0-alpha-jumptable.patch.gz
Description: GNU Zip compressed data