Which system are you developing for?
Unless it's embedded, the move to float probably won't make much difference at all. And if it's embedded, using LNUM patch (I know you said no C, does that include no patching as well?) will make more difference, and make it easier to try out effects of different number modes.
Have you gone through the easy and obvious: using locals where-ever you can instead of globals?
If you're on x86 system, why not try LuaJIT. It'd probably give you the most speedup.
About details on any of these, check the archives: http://lua-users.org/lists/lua-l/
At the end it comes to understanding your module's behaviour, most likely measuring it somehow, and fine tuning the most important parts. You may not be spared of going to C; why is that so no-no?
Andrew Yount kirjoitti 11.8.2008 kello 22:03: