lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Gary Bringhurst wrote:
> I've been timing table accesses under LuaJIT using various key types in a 
> very simple tight loop.  Positive integers are the fastest (of course), 
> while negative integers, tables, and functions used as indices are 
> approximately 33% slower.  That's not bad, given that we're using the 
> hashed storage for the table in these cases.
>
> But when I test using strings as table keys they time out as over 100%  
> slower, meaning they take twice as long to execute.  I've got the code  
> structured to avoid any string operations, so those aren't a part of the 
> timing.

Your benchmark is flawed in several ways:

- Quite a bit of the runtime is used by calling random().
  ==> You're not only measuring what you intended to measure.

- The size of the created tables is too large, making this an
  out-of-cache problem. And to top this off, you are using
  randomized access for an out-of-cache problem.
  ==> This is not representative of indexing as used by regular
      Lua programs.

> This was very surprising to me, because my assumption was that any key  
> would be used in essentially the same way for a lookup.

Strings are hashed once when they created. Their hash value is
stored in the string object. Other GC objects use pointer hashing,
i.e. their address is hashed before every hash lookup.

The performance of a hash lookup is mainly dominated by the lookup
in the hash table itself. Here we have an out-of-cache problem, so
this accounts for several cache line fills.

Pointer hashing never dereferences the pointer and never accesses
the GC object itself. But string hashes are fetched from the
string object, which may add another cache line fill.

Cache misses are costly (10-100 cycles). And in this case they are
in the dependency chain, so their latency cannot be hidden. The
total performance is bound by the performance of the cache misses.
And string keys happen to add some more cache misses in this
benchmark.

You can verify this by running LuaJIT under Valgrind's cachegrind
tool. I've subtracted the time for the initial setup and divided
the D1 and L2d misses by "count". Here's the number of cache
misses per iteration and their estimated cost (10/100 cycles):

               | D1     L2d    | Est. cost of
	       | misses misses | cache misses
---------------+---------------+--------------
string key     | 6.26   1.26   | 188.6 cycles
non-string key | 3.43   0.97   | 131.2 cycles

That's a lot, compared to the few cycles for the hash lookups
themselves. And it explains the performance differences you're
seeing *in this particular benchmark*.

> Does anyone know why string keys are so much slower?

Well, they are not. In fact they are slightly faster in practice,
because their hash values are precomputed.

Method tables are usually tiny compared to other data structures
in an application. And there is only a small number of them (one
for every class or module you define). This is almost certainly an
in-cache problem and interned strings *do* pay off here.

> And does this mean that for  
> maximum performance we'd be better off always using "tab[index]"  
> accesses instead of "tab.name" accesses?

Umm, of course t[1] is faster than t["foo"] because an array
access is faster than a hash access. But when it comes to method
lookups, the performance difference becomes mostly irrelevant.

Also, I'm not sure you'd want to trade this:
  math.abs(math.sin(x)) - math.abs(math.cos(y)) * math.sqrt(z)
for that:
  _G[1][11](_G[1][3](x)) - _G[1][11](_G[1][4](y)) * _G[1][16](z)

Well, probably not ... :-)

--Mike