Re: What would you remove from Lua - a case of regression?

I think that the L1 argument is valid when Lua is used with a JIT compiler, so that the native instruction set is extremely easy to streamline without extra overheads.

However full support of Lua semantics on a CPU still requires typechecking and bound checking at almost every instruction (something that a CPU instruction set does not perform usually, except possibly bound checking in a limited way, but not at all type checking).

Type checking for Lua does not require much space in L1 code cache (even if it will be a significant constant cost), but it involves a cost in terms of branch prediction and then indirectly on the L1 data cache. To soplve this cost would require a much smarter optimizing JIT compiler capable of doing itself what the CPU has to guess (on CISC CPU) or just decide arbitrarily (on RISC machines)

If security is a concern, branch prediction has to be taken into account and instead of "guessing" or trying to be accurate (~20% vs. ~80% on RISC) or using assumption (=0% vs. =100% on CISC), it will be safer to just use pure randomized branch prediction (~50% vs. ~50% between branching vs. no branch: a strong random generator will be needed in processors, because a simple "binary toggle" is very easy to exploit).

And I don't see how a native CPU instruction set can easily implement such strategy without additional software and hardware support (a solid random generator would be very costly to implement by software only for every conditional branch in the compiled code, it would most probably fill up the L1 code cache or the L1 data cache or both; performance would become extremely poor, as if there were no L1 cache at all). This could change if the CPU had a native instruction to return a strong random number from an internal source (possibly compbining several things like high-performance counters, temperature sensors, or measurement of chaotic physical responses in clock generators and amplifiers, such as stabilization currents when a numeric gate switch states; a military-grade machine could use an external atomic source or an electron beam accelerator with electron counter, or a tunnel-effect transistor, or a zener diode; in fact modern CPUs are so much integrated that they constantly have to "fight" chaotic effects, so there are good random sources everywhere and instead of filtering and dropping them, they could be detected and used to feed a register accessible directly by a usable instruction...)

Le mer. 28 nov. 2018 à 08:30, Muh Muhten <muh.muhten@gmail.com> a écrit :

On 11/27/18, Tim Hill <drtimhill@gmail.com> wrote:
> In fact we were able to realize this very thing in our last project with
> Lua, in which pretty much the whole of the Lua VM, libraries, CPU stack and
> current set of Lua registers (stack frame) were all held in L1 cache on the
> CPU. In effect, the CPU at that point *IS* micro-coded for the Lua VM, and
> we saw spectacular performance from the VM as a result.

Wow! How much tuning did it take to get all that to fit in the L1? I'm
guessing you could strip down much of the libraries and wouldn't need
e.g. the parser at runtime, but just the VM alone looks like it'd
almost fill the cache without some extra work.