lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Heh, looks like I was off with my assumptions, and Mike Pall already replied. Ignore my message.

As for benchmarking I found that putting

garbagecollect('collect') in between tests would give different timing.
AFAIK the garbage collector seems to get slower with more things allocated (side effect of mark&sweep?).

Not that this is very important, as after all code works in real environments, not isolated garbage-collect free


On 8/26/11 10:22 AM, Dimiter "malkia" Stanev wrote:
Yup. I wasn't much helpful here.

Is it the case that recursive functions are somehow less possible to
optimize with luajit (pulling this off my ass really, so don't trust it).

I've looked at latest luajit (I'm always syncing up to latest), and the
"FYI: FastFunc pairs" is the NYIFF code, which I think is only being
produced by recff_nyi, and grep on who's doing that reveals this:

malkia ~/p/luajit $ git grep recff_nyi | grep -vi recff_nyiu
src/buildvm_lib.c: "recff_nyi,\n"
src/lj_crecord.h:#define recff_cdata_index recff_nyi
src/lj_crecord.h:#define recff_cdata_call recff_nyi
src/lj_crecord.h:#define recff_cdata_arith recff_nyi
src/lj_crecord.h:#define recff_clib_index recff_nyi
src/lj_crecord.h:#define recff_ffi_new recff_nyi
src/lj_crecord.h:#define recff_ffi_string recff_nyi
src/lj_crecord.h:#define recff_ffi_copy recff_nyi
src/lj_crecord.h:#define recff_ffi_fill recff_nyi
src/lj_crecord.h:#define recff_ffi_istype recff_nyi
src/lj_crecord.h:#define recff_ffi_abi recff_nyi

these all seems to be only related to FFI stuff.

Now for the blacklisted it seems to be because of the BC_IFUNCF bytecode

void lj_record_ins(jit_State *J) {
.....
case BC_IFORL:
case BC_IITERL:
case BC_ILOOP:
case BC_IFUNCF:
case BC_IFUNCV:
lj_trace_err(J, LJ_TRERR_BLACKL);
break;
....
}

Seems like IFUNCF stands for Interpretted FUNCtion Fixed args.

I've looked more, but I'm no Mike Pall... I just have a feeling that
recursion makes it goes back to interpretter mode.

And it's not hard to turn any recursive code to non-recursive - by
simply maintaining alternative stack for the arguments - I might try it
out just for fun!

Cheers!


On 8/26/11 9:31 AM, Alexander Gladysh wrote:
On Fri, Aug 26, 2011 at 20:12, Alexander Gladysh<agladysh@gmail.com>
wrote:
On Sun, Aug 21, 2011 at 05:03, Dimiter "malkia"
Stanev<malkia@gmail.com>
wrote:

Here is a little bit more optimized version.

<...>

For the record, here are the reproducible luamarca benchmark results:
$ KBENCH_INTERPRETERS=luajit2 ./run_benchmark.sh bench/tclone.lua 1e6
Results:
luajit2
-------------------------------------------------------------------
name | rel | abs s / iter = us (1e-6 s) / iter
-------------------------------------------------------------------
tclone5 | 1.0000 | 20.19 / 1000000 = 20.190000 us
tclone2 | 1.1620 | 23.46 / 1000000 = 23.460000 us
lua_nucleo | 1.3150 | 26.55 / 1000000 = 26.550000 us

The code is here:
https://github.com/agladysh/luamarca/blob/master/bench/tclone.lua

...But LJ2 still blacklists the tclone5() trace, so both my original
questions still stand.

The reason for that is probably this:

---- TRACE 1 abort tclone.lua:108 -- NYI: FastFunc pairs

Mike, please, any input?

Thanks,
Alexander.

luamarca$ luajit2 -jv -jdump bench.lua bench/tclone.lua tclone5 1e5
...

---- TRACE 1 start tclone.lua:96
0001 TGETV 3 1 0
0002 ISF 3
0003 JMP 4 => 0007
0007 ISNEN 2 0 ; 128
0008 JMP 3 => 0012
0012 TNEW 3 0
0013 UGET 4 1 ; pairs
0014 MOV 5 0
0015 CALL 4 4 2
0000 . FUNCC ; pairs
---- TRACE 1 abort tclone.lua:108 -- NYI: FastFunc pairs

---- TRACE 1 start tclone.lua:238
0001 UGET 0 0 ; tclone5
0002 UGET 1 1 ; DATA
0003 CALL 0 2 2
0000 . FUNCF 5 ; tclone.lua:129
0001 . UGET 1 0 ; type
0002 . MOV 2 0
0003 . CALL 1 2 2
0000 . . FUNCC ; type
0004 . ISNES 1 0 ; "table"
0005 . JMP 1 => 0011
0006 . UGET 1 1 ; impl
0007 . MOV 2 0
0008 . TNEW 3 0
0009 . KSHORT 4 1
0010 . CALLT 1 4
0000 . IFUNCF 14 ; tclone.lua:96
---- TRACE 1 abort tclone.lua:96 -- blacklisted
...