lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike Pall wrote:
[...]
> Well, yes, it's disgusting. :-) You are putting a loop around
> something which should be mostly linear control flow. And all of
> the branches inside are now much less predictable, too.
> 
> Congratulations, you've managed to brew up the perfect nightmare
> for a trace compiler and a super-scalar CPU at the same time. ;-)

Yes, well, that's what you get when you try to translate code written
for a language that supports arbitrary bb graphs to a language that does
*not* support arbitrary bb graphs... I remember that way back when you
pointed me at some algorithms for converting a bb graph into a set of
if...else...end, while...end etc constructs, but the problem appears not
to be solvable in the general case.

At this point the illustrious reader may wish to insert my standard rant
on the evils of languages with no 'goto' (which I'm sure you've all
heard before).

[...]
>> (I have had thoughts about using tail calls to do the transfer of
>> execution from one basic block to another, but I haven't come up with a
>> decent way of doing it without doing memory allocations on entry to each
>> function.)
> 
> Dunno about these allocations. But tail calls are very fast with
> LJ2. The compiler turns them into an inlined goto, i.e. a no-op.

In which case it's probably worth having another look at. Having
actually put a couple of minutes thought into it I realise that I don't
think I do need memory allocations at all. Each bb state transition
would involve passing a huge pile of arguments:

local function test_function(fp, stack, H0, H1, H2)
  local H3, H4, H5
  -- do stuff here
  if H0 == 1 then
    return test_function_bb_3(fp, stack, H0, H1, H4)
  end
  if H0 == 2 then
    return test_function_bb_42(fp, stack, H4, H5, H1, H2)
  end
  return 94
end

I'm assuming that all the argument passing would usually turn into no
code in LuaJIT due to the way the trace compiler works.

Alas, while sparse does support managing bbs this way (with distinct
import, export and private registers in each bb), it's entirely
undocumented, has a very clunky API, and is hugely bug-ridden. I don't
fancy the archaeology required in figuring out how it all works. Plus,
as Lua is oddly one of the few modern languages to support proper tail
calls, I wouldn't be able to use the same code generator architecture
for, say, Java.

Anyone know a good C compiler framework I could use instead of sparse?

[...]
> Yikes. Can we come up with some nice isolated test case? E.g.
> something that can be scaled to run for 0.5-10 seconds and only
> does _one_ thing. Maybe translate one of the C shootout benchmarks?

Actually, Clue uses some of the shootout benchmarks as unit tests.
Results follow:

                       time    comparison factor
clbg-binary-trees:
gcc:                   0.054           1.000
luajit:                3.919          72.543
luajit2:               2.110          39.063
lua:                   4.363          80.763
clbg-mandelbrot:
gcc:                   0.249           1.000
luajit:                4.452          17.900
luajit2:              13.530          54.396
lua:                  26.929         108.268
clbg-partialsums:
gcc:                   0.267           1.000
luajit:                0.722           2.705
luajit2:               1.673           6.267
lua:                   3.270          12.246
clbg-fannkuch:
gcc:                   0.015           1.000
luajit:                0.347          22.670
luajit2:               0.704          45.906
lua:                   2.062         134.563
clbg-recursive:
gcc:                   0.195           1.000
luajit:                1.335           6.848
luajit2:               5.679          29.133
lua:                   8.008          41.076
clbg-n-body:
gcc:                   0.045           1.000
luajit:                0.705          15.682
luajit2:               1.706          37.944
lua:                   3.315          73.712
clbg-nsieve:
gcc:                   0.133           1.000
luajit:                1.894          14.285
luajit2:               3.855          29.074
lua:                   6.645          50.113

Huge great slab of code for the main nsieve function follows (note that
pointers are stored as (offset, table) register pairs in Clue --- see
the return from _malloc early on in the function):

static_1743_0_nsieve = function(fp, stack, H0)
local sp
local H1
local H2
local H3
local H4
local H5
local H6
local H7
local H8
local H9
local H10
local state = 0;
while true do
repeat
if state == 0 then
sp = 0
sp = fp + sp
H7 = _malloc
H1, H2 = H7(sp, stack, H0)
H3 = H1
H4 = H2
H7 = 1
H8 = _memset
H9, H10 = H8(sp, stack, H3, H4, H7, H0)
H6 = 0
H5 = 2
state = 1 break end
if state == 1 then
H7 = H5 < H0 and 1 or 0
if H7 ~= 0 then state = 3 else state = 7 end break end
if state == 2 then
H7 = 1
H8 = H5 + H7
H5 = H8
state = 1 break end
if state == 3 then
H7 = H1 + H5
H8 = H2[H7 + 0]
if H8 ~= 0 then state = 4 else state = 2 end break end
if state == 4 then
H8 = 1
H9 = H6 + H8
H8 = 1
H10 = shl(H5, H8)
H6 = H9
H7 = H10
state = 5 break end
if state == 5 then
H8 = H7 < H0 and 1 or 0
if H8 ~= 0 then state = 6 else state = 2 end break end
if state == 6 then
H8 = H1 + H7
H9 = 0
H2[H8 + 0] = H9
H8 = H7 + H5
H7 = H8
state = 5 break end
if state == 7 then
H1 = _free
H1(sp, stack, H3, H4)
H2 = static_1743_1_anon
H1 = 1
H3 = _printf
H4 = H3(sp, stack, H1, H2, H0, H6)
do return end
end
until true
end
end

- --
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│ "There is nothing in the world so dangerous --- and I mean *nothing*
│ --- as a children's story that happens to be true." --- Master Li Kao,
│ _The Bridge of Birds_
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFK7doIf9E0noFvlzgRAmp1AJ9BPlFSrvT6retLKjuffAaD5EzrlQCgpkfe
08Ma4A9drpwO01kyTOQIvHY=
=Z+RJ
-----END PGP SIGNATURE-----