lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Florian Weimer wrote:
> I guess most of the speed-up comes from improved locality due to
> smaller object size anyway.  I can't use the 32/32 split because I
> need 48 bits for pointers, and as a side-effect, I bypass the store
> forwarding issues (I think, at the very least I always store and load
> 64 bit quantities).

I use 32 bit pointers, even on x64. This preserves the cheap type
check (a single cmp [mem], imm8 -- not spilling a register and in
a separate dependency chain) and gives enough freedom for the type
tags themselves. So there's no need to canonicalize NaNs after
every operation, which is an expensive step in your patch. The FPU
on its own only generates 0xfff80000_00000000. Canonicalization is
only needed for the two ingress points (lua_pushnumber and

> Do you see a larger speed-up than 10% (with heap-intense loads)?

That's hard to say, because I can't easily compare different
TValue sizes anymore. LJ2 in interpreted mode is roughly 3x-4x
faster on binarytrees than plain Lua. But apart from the speedup
due to the faster interpreter, binarytrees is mostly an allocator
benchmark. LJ2 uses its own memory allocator which is between
1.2x-5x faster (depending on the speed of the system allocator).
LJ2 also does array colocation, which further reduces the memory
footprint and allocation overhead.

I think the binarytrees benchmark is unsuitable to measure the
effects of the lower memory footprint, due to excessive allocation
and GC activity. Try comparing Lua compiled as a 32 bit binary
(with 12 byte TValue on POSIX/x86) and as a 64 bit binary (with 16
byte TValue) on the same machine. I don't see much of a difference
for binarytrees.