lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

The following patch to Lua provides an experimental implementation of
a type of Single Instruction Multiple Data (SIMD) capability in the
Lua VM for increased performance on specialized computations:

Importantly, the initial implementation here is ANSI C, and it does
not use any SSE instructions or multithreading. How can this be? The
opcode dispatch in the Lua VM imposes a non-negligible overhead. If,
however, we interpret each opcode (instruction) once and execute that
opcode on multiple data elements, we could expect to reduce the
relative overhead of the opcode dispatch, even if the data is
processed serially in each opcode.

A simple test of summing integers reduced runtimes by 70%.  Adding
native SSE2 support might improve runtimes further.

  -- test1-standard.lua  (Standard version)
  local sum = 0
  for i=1,2^28 do sum = sum + i end

  -- test1-simd.lua (SIMD version)
  local N=_SIMD_LEN

  local j;    for k=1,N do packed(j,k)   = k end
  local psum; for k=1,N do packed(psum,k)= 0 end
  local fi;   for k=1,N do packed(fi,k)  = k end
  local fs;   for k=1,N do packed(fs,k)  = N end

  for i=fi,2^28,fs do
    psum = psum + i
  local sum = 0
  for i=1,N do
    -- print('partial sum', i, packed(psum,i))
    sum = sum + packed(psum,i)
  print('sum:', sum)

Additional details are on the wiki page.  I consider this a
proof-of-concept, with the hope it will inspire other implementations.