Re: What would you remove from Lua - a case of regression?

It's interesting to know how Google's V8+ engine handles _javascript_'s dynamic types: for each object defined or each time a property is added/modified/removed, it checks its datatype and creates/updates an "interface" internal structure; all new objects created from the same object immediately reuses the same interface structure; the interface is made so that it can access to the actual object's properties or to its compiled version based on the interface's signature (this is only a superficial description, more details are documented, but the principle is that it does not need to compile code for each object or each time one of its properties is set/modified/removed).

Many objects share the same interface and it's actually rare that new interfaces will need to be created, so JIT compilation of new interfaces is rare (except at start of an entiorely new script which defines new objects), JIT compilation then only occurs the first time a method in the object is accessed, then the compilation remains cached in the interface object (in some cases, these cached entries may still be freed when needed because the cache stores these precompiled binary fragment using weak pointers so that the cache will not necessarily be permanent when the object is created once, a method is compiled, called once but (almost) never reused later and the VM is short of memory: these compiled fragments are then garbage-collectable using some cache eviction strategy.

(I've not studied which strategy is used, but Google is probably aware that the simple global LRU eviction strategy is now a severe security risk, and may isolate these caches with one for each thread; other isolation mechanism are possible to avoid time-based attacks between threads not working within the same security context and there are certainly security contexts which may include multiple threads with the same privileges, where such separation of caches is not necessary as it would cost a lot in terms of global memory usage, allowing possible DOS attacks). Since V8, the engine has had several new major versions to refine how it works.

But the principles is there: the dynamic type of any object is converted to a set of static types determined by the last state of an object. The polymorphic object at one time may be in one type and then another, but genreally each object only has a small finite set of possible actual static types it can "adopt" during its lifetime. And the JIT is able to automatically determine the signature of each type and cache as many compiled versions of its methods as needed, and only compile what is needed, method by method (and not necessarily all properties of the object at once).

The compiler is also smart enough to not precompile a method if it does not get reused at all: the first invokation of the method can just mark that method to be compiled on next invokation, but then it can be interpreted (I think that it is more granular than just a single method, it may precompile smaller fragments, such as separate conditional branches, or loops, so that only the first loop or fur use of the branch will be interpreted, then the second use will be compiled; the compiler may also work in a background thread instead of being blocking, using a cache of candidate fragments: the compiler can do its work in the background without blocking the actual threads which can continue running immediately in interpreted mode after "suggesting" rather than "instructing" the compiler to transform the code which may be a mix of interpreted virtual opcodes and native instructions that are inserted to progressively replace some fragments in interpreted mode)

The V8 engine is opensourced. You can see a summary description of how it works on http://thibaultlaurens.github.io/_javascript_/2013/04/29/how-the-v8-engine-works/

However the cache itself is not described and may need further inspection because it can become the target of time-based attacks. In reality there are at least two compilers, one is non-optimizing but produces code that is automatically profiled, and then a second more complex compiler can run to perform optimizations which cannot be made immediately (such as branch prediction, or inlining called methods, and then detecting tests that are always true/false to eliminate dead code and detect new constant subexpressions, or moving common subexpressions out of loops, or better scheduling the allocation of native registers and compact the set of upvalues/local variables in stack with better placement to improve data locality and maximize the efficiency of caches, or find way to parallelize the instructions into more pipelines with a minimum "idle" states and less contention between them, if their execution cause them to require acces to some limited shared resources, like internal ALUs/FPUs, or external bus ports for I/O and L2/L3/memory accesses).

An "optimizing" compiler is a real challenge as it exposes many risks that are much harder to check and secure (especially with very complex instructions sets like x86 and x64, where the actual implementation in silicon varies a lot across CPU versions or manufacturers).

Le mar. 27 nov. 2018 à 00:39, Coda Highland <chighland@gmail.com> a écrit :

On Mon, Nov 26, 2018 at 4:41 PM Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
>
> On Mon, 26 Nov 2018 at 22:22, Coda Highland <chighland@gmail.com> wrote:
> >
> > On Mon, Nov 26, 2018 at 1:33 PM Dibyendu Majumdar
> > <mobile@majumdar.org.uk> wrote:
> > > I think it would be nice to have two VMs built-in - a full featured
> > > one and a cut-down one, with user being able to choose the one they
> > > want to use. But it is harder to switch between dual number type to a
> > > single number type.
> >
> > Have you considered doing the work at a different level? One common
> > bytecode format, one common VM, two parsers? Or possibly even a
> > source-to-source transpiler that compiles full-Lua down to mini-Lua?
> >
>
> Hi, that would not solve the problem as the problem I am trying to
> solve is to have simpler code to execute with less unpredictable
> branches. A common VM would have to handle the worst case. However
> what I could do is fall back to the full featured VM when a type check
> fails. This would cause a small performance hit for Lua code that
> relies upon the bigger feature set, but in theory this cost can be
> minimized by black listing the function so that next time it goes
> immediately to the fallback VM.

That's consistent with the polymorphic variation I was describing.

LuaJIT tries to be a little bit more forgiving about it. If it fails
the type check, it goes ahead and JITs it again with the new types. If
it fails AGAIN (I don't know exactly how many possible variations it
allows), it gives up and assumes it's a megamorphic function and just
always sends it through the slow-and-steady route.

Assuming your two VMs are still able to operate on the same in-memory
data structures so execution can freely switch between them, this
isn't an unreasonable idea at all.

/s/ Adam