I regularly run Lua on an embedded system (ARM9). Changing the compiler into a cross-compiler is easy, even in our case where we had to modify the bytecode format (to allow execution from flash).
As for gains linked to precompiling:
with a bit of tuning you can run the bytecode straight from flash, thus saving some RAM
Lua itself use very little stack space (coroutine stacks are allocated in the heap). The compiler, however, can consume quite a lot of stack while generating bytecode, typically much more than the bytecode will use when interpreted. Precompiling therefore lets us reserve less RAM for the stack.
In terms of speed and storage space, there is little gain to expect, especially if you don't strip the debug symbols.