Re: Avoiding FFI- allocations + using SSE-vectors

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Avoiding FFI- allocations + using SSE-vectors
From: Mike Pall <mikelu-1202@...>
Date: Mon, 6 Feb 2012 13:10:47 +0100

Wolfgang Pupp wrote:
> I also tried to make it use SSE, and that seems to work just fine
> (MinGW on Win7 32). It needs a tiny wrapper-dll, because LuaJIT can't
> directly call ffi-functions with vector arguments (yet!)- so I pass
> them via pointers.
> I only implemented single-precision-4-float addition for now anyway-
> it's just a proof of concept, I think ffi- vector operations are
> somewhere on Mike's TODO-list (maybe someone will even sponsor that
> and we'll have it in a blink ;).

My plan for implementing SIMD operations is this:

- Add generic vector type(s) to the IR of the JIT compiler.

- Record the minimum required vector ops needed for the basic
  initialization and assignment semantics. We can get by with
  init/splat, select/project and load/store.

- Decompose vector ops, if the machine-specific backend doesn't
  support a particular type.

- Add support for the basic hardware vector ops to the backends.
  First would be SSE for x86/x64.

- Add suport for user-definable intrinsic (builtins) with machine
  code templates, e.g.:

  __v2df __builtin_ia32_addpd(__v2df, __v2df) __mcode("660F58rM");

  Needs to be done for each backend, x86/x64 first. The intrinsics
  will be inlined into the JIT-compiled machine code, of course.

- Add a ffi.vec module that defines standard vector types and
  attaches the machine-specific intrinsics as methods/metamethods:

  local vec = require("ffi.vec")
  local v2df = vec.v2df
  local v1 = v2df(1.5, 2.5)
  local v2 = v2df(10.0, -4.25)
  print(v1 + v2)  -->  cdata<__v2df>: (11.5, -1.75)

- Miscellaneous stuff, e.g. FFI calling conventions for passing
  vectors to C functions or alignment restrictions for vectors.

- Add allocation sinking and store sinking to avoid (most) vector
  allocations. SIMD vectors are value types, so this is a bit
  easier: vectors are immutable, passed by value and you cannot
  get a reference to the boxed contents. I.e. the boxing can be
  eliminated in almost all cases.

  But one really needs generic support for allocation/store
  sinking. That would allow efficient support for complex numbers
  and for short arrays of vectors (SIMD matrix types), too.

- Add auto-vectorization. Umm, err ... that's really tough. Let's
  leave that out of the plan for now. :-)

Phew ... ok, so that's a lot of work (a couple months). If any
potential sponsor is interested, I'm willing to go for this.
Please contact me via the address given on the sponsorship page:

  http://luajit.org/sponsors.html

Another thing to consider is that I'd rather postpone all of this
to LuaJIT 2.1. Part of it is a complete redesign of the GC and the
memory allocator. That makes some things a lot easier (e.g. aligned
allocations).

I really need to freeze LuaJIT 2.0 and finally make a non-beta
release once the MIPS port and the console ports are done. Maybe
ARM VFP support (hardware floating-point ops) ought to be part of
LuaJIT 2.0, too (sponsor needed). Ok, so there are plenty of items
left on the TODO list, but one has to draw a line somewhere.

--Mike

Follow-Ups:
- Re: Avoiding FFI- allocations + using SSE-vectors, Adam Strzelecki

References:
- Avoiding FFI- allocations + using SSE-vectors, Wolfgang Pupp

Prev by Date: Re: Avoiding FFI- allocations + using SSE-vectors
Next by Date: Re: [ANN] GSL Shell 2.1 released
Previous by thread: Re: Avoiding FFI- allocations + using SSE-vectors
Next by thread: Re: Avoiding FFI- allocations + using SSE-vectors
Index(es):
- Date
- Thread