lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


CrazyButcher wrote:
> http://crazybutcher.luxinia.de/wip/luajit_ffi_ogl.gif (yeah I mistyped
> some stuff in there hehe)

Thanks for testing the LuaJIT FFI! I guess I have to clear up some
misconceptions (my fault, since I haven't written the docs, yet):

ffi.cdef doesn't return a value, but it takes multiple declarations
('extern' by default). So it's usually used only once to declare
everything at the start of a module. Something like this:

  local ffi = require("ffi")

  ffi.cdef[[
  struct foo { int a,b; } foo_t;
  int foo(foo_t *x);
  int MessageBoxA(void *w, const char *txt, const char *cap, int type);
  ]]

The ffi.C default namespace and the namespaces returned by ffi.load
can be indexed like any Lua object. So t["a"] is the same as t.a.
I.e. you can shorten function calls:

  ffi.C.MessageBoxA(nil, "Hello world!", "Test", 0)

[Note that MessageBoxA is a __stdcall. The LuaJIT FFI auto-detects
this, so you don't have to deal with this mess. :-) ]

Or for your example:

  ffi.cdef[[
  int glfwInit(void);
  ]]

  local glfw = ffi.load("glfw")
  glfw.glfwInit()

> Now some questions:
> 
> blubb_data = ffi.new("blubb")
> blah_data = ffi.cast("blah*",blubb_data)
> works fine, does the object returned also reference (in GC sense) the
> ffi object it originated from (blubb_data)? Or is the returned object
> just a pointer, and only objects created by ffi.new have GC to cleanup
> memory.

Casts, constructors or any implicit references do NOT create any
GC references. Neither could this be the case for any objects you
get back from C calls. You need to take care to keep references to
GC objects yourself (e.g. the results of ffi.new() or ffi.cast()).

Some more hints:

It's faster/easier to use constructors instead of ffi.new():

  local foo_t = ffi.typeof("foo_t") -- Do this once.

  -- Some often called part:
  local x = foo_t()
  local y = foo_t(1, 2) -- Takes initializers, just like ffi.new().

There's little need to cast arrays/structs to pointers. All
conversions that result in a pointer to an aggregate accept either
a pointer to an aggregate _or_ an aggregate itself. So you can
just write this:

  C.foo(x)

Even though 'x' is a struct and foo() wants a pointer to a struct.

In general you'll only need ffi.cast() to cast between pointer types
to satisfy external API constraints or to force a specific type for
vararg parameters (though you could use a scalar constructor, too):

  ffi.cdef[[
  int printf(const char *fmt, ...);
  ]]

  local x = 12
  ffi.C.printf("double=%g int=%d\n", x, ffi.cast("int", x))

[Lua numbers are doubles. It doesn't matter whether they have a
fractional value or how you write the number literal. '12.0' is
absolutely identical to '12' from the view of the parser. The
conversion to integers is automatic for fixed C function
parameters -- only varargs need special handling.]

If any C call wants a pointer to a scalar (to return something),
just pass it a one-element array:

  ffi.cdef[[
  int sscanf(const char *str, const char *fmt, ...);
  ]]

  local pn = ffi.new("int[1]")
  ffi.C.sscanf("foo 123", "foo %d", pn)
  print(pn[0]) --> 123

A slightly more involved example, showing how to use mutable buffers:

  ffi.cdef[[
  int uncompress(uint8_t *dest, unsigned long *destLen,
                 const uint8_t *source, unsigned long sourceLen);
  ]]

  local zlib = ffi.load("z")

  local function uncompress_string(comp, origsize)
    local buf = ffi.new("uint8_t[?]", origsize)
    local buflen = ffi.new("unsigned long[1]", origsize)
    assert(zlib.uncompress(buf, buflen, comp, #comp) == 0)
    return ffi.string(buf, tonumber(buflen[0]))
  end

> The jit would optimize away the safety checks and table lookups if I
> call some dll function always in the same manner
> 
> "ogl32.glClear(...)" or would it still be favorable to store functions locally?
>
> local clr = ogl32.glClear
> function oftenCalled()
>   ...
>   clr(...)
>   ...
> end

No, please don't do this. The namespace lookup will be shortcut,
so there's no need (and in fact it's counter-productive) to keep
local references to functions. Also, please don't keep local
references to intermediate parts of nested arrays/structs (always
use 'y = foo[10].x' and not: 'local s = foo[10]; ...; y = s.x')

OTOH you should keep the namespace itself ('ogl32') in a local or
upvalue. So the recommendation is to do it like this:

local ogl = ffi.load("OpenGL32")

local function oftenCalled()
  ...
  ogl.glClear(...)
  ...
end

[
Currently the JIT compiler doesn't compile C function calls (needs
some redesign first), so they are still interpreted. But this is
going to be the behavior whenever I implement it.

And I should note that converting Lua functions to C callbacks
doesn't work yet. This is quite tricky and will likely be one of
the last things I'll implement.
]

> Given the 2008 roadmap description of LuaJIT 2[1]
> - constant folding could not apply, as ogl32 is just a table, hence
> the "upvalue" case would be favorable
> -  On the other hand guard-moving would result into upvalue like code...

Namespaces are not tables, they are tagged userdata objects. The
JIT compiler detects this and specializes to the key ("glClear").
Thus the value (the cdata function object) becomes a constant,
which in turn makes its type and address a constant. This
eliminates all lookups at runtime.

> You also wrote that there is a trace-tree, so "hot switches/branches"
> would result into dedicated trace paths... basically all that means
> there is not really much left the user would have to manually do, to
> get optimal results?
> 
> Is there a way to "pre-define" a hotpath. Say I want no delays due to
> runtime compilation at a later time, but as I have the knowledge of
> the hotpath I want to trigger the compilation for it manually?
> 
> say that I know the first time the function is called, I want to force
> it "over the threshold" to do the trace record and so on. That way the
> first "frame" would take a bit longer, but the others would be faster.

The JIT compiler is _very_ fast. It's operating in the microsecond
range and the load is spread out over the bytecode execution (the
compiler pipeline is fed incrementally). I'd be very surprised if
you'd be able to see a frame glitch or even notice that the JIT
compiler is running at all.

Also, it's not a good idea to second-guess the region selection
heuristics. I tried this many times ('This can't be the best
path!') and failed miserably. The selected traces are sometimes
very strange, but so far they turned out to be near optimal. It's
quite embarrasing if your own creation turns out to make smarter
decisions than yourself. ;-)

--Mike