Re: What would you remove from Lua - a case of regression?

Note: math.floor() is not required toi return a native binary integer. It may return a variant type using integers for some ranges, and doubles outside the range. So even in this case the the inference would not help eliminate the necessary tests on the effective type of the variant... The compiler has to know also the value range.

But your existing function fromjulianday(jd) does not check the value range of d, so it's not possible to determine the value range for math.floor() and then reduce it to code using only native integers.

What your code can do however is to favorize the integer case by making branch prediction and avoiding most branches for that case. The CPU will iteself make its own branch prediction anyway and possibly adapt itself using a branch prediction cache.

What is much more useful is to know how instructions are scheduled and pipelined: augment the number of distinct instructions that can handle data in parallel, according to the number of native registers you have and avoid scheduling two costly execution units (like FPUs) in parallel.

Instruction scheduling (and correct understanding of how CPU pipelines are working (and how the central pipeline stage, i.e. execution in a ALU/FPU/VPU port or memory access, can avoid using the same cycles: this also mitigates the time attacks based on dynamic branch predictions and speculative execution).

A good optimizer also can know equivalent patterns of instructions that avoid branch predictions (e.g. it's perfectly possible to implement the function abx(x) without using any conditional branch, by computing the two parallel branches and combining them with a binary or: this uses two parallel pipelines and a final "rendez-vous" where some other independant instructions are scheduled just before computing the binary OR): it is difficult to find other instructions in an isolated abs(x) function, but not if the function is inlined within more complex expressions. Detecting such patterns requires a good database of equivalent binary expressions (e.g. optimizing multipliations by shifts, and optimizing shifts by additions, and avoiding patterns like "store Register[n] to [x]; read Register[m] from [x]" and replacing it by "move Register[n] to [x]; move Register[n] to Register[m]" where the two instructions can run in parallel...). Such rewriting required analyzing the datapaths and which virtual intermediate result is reused later or not, so that you can reduce the number of registers needed. This will then largely reduce the footprint on the datacaches (from the stack or actual native registers).

Note that the ompiler must still be correct semantically (notably for handling exceptions/errors and pcall() correctly).

As well a compiler can autodetect constant expressions like "(a+b)*(c-(a+b))". But this is not easy at all if some elements are actually functions like in "Rand()+Rand()" where not only the two calls to function Rand() is supposed to return different numbers, but also the function must be called twice (the number of calls may be significant is the function uses variables kept across calls in its "closure")

Le dim. 25 nov. 2018 à 23:04, Sean Conner <sean@conman.org> a écrit :

It was thus said that the Great Dibyendu Majumdar once stated:
> Hi,
>
> One of the difficulties of generating high performance code for Lua in
> a _naive_ way is that many op codes have a lot of branching. A simple
> add instruction may do:
>
> if (isint(a) && isint(b))
> integer add a, b
> else if (isfloat(a) && isfloat(b))
> float add a, b
> else if (hasaddmeta(a) || hasaddmeta(b))
> invoke meta add a, b
>
> This is really inefficient when JIT compiled.

There are languages that do type inference, where from the context of the
code it can determine the type of a variable. So for instance, if all calls
to a function always pass in integers, you can specialize the code for that.
For example, this function:

-- written for Lua 5.1
function fromjulianday(jd)
local a = jd + 32044
local b = math.floor((4 * a + 3) / 146097)
local c = a - math.floor((b * 146097) / 4)
local d = math.floor((4 * c + 3) / 1461)
local e = c - math.floor((1461 * d) / 4)
local m = math.floor((5 * e + 2) / 153)

return {
day = e - math.floor((153 * m + 2) / 5) + 1,
month = m + 3 - 12 * math.floor((m / 10)),
year = b * 100 + d - 4800 + math.floor(m / 10)
}
end

Given the presence of math.floor(), this indicates the code wants integer
results. Also, jd is used as if an integer, so this function can be flagged
as taking an integer as its parameter.

Now, having only heard of type inference (and very rarely used a language
that has it) I don't know how difficult this would be.

> * Eliminate metamethods except for __gc.

I looked over my own code for uses of metatables. I use them extensively
for userdata, very rarely for tables, and not at all for the other types.
Furthermore, there are a few methods I've repeatedly used (for both, unless
otherwise noted):

__mode (table)
__index
__newindex
__tostring (userdata)
__gc
__len

I did find one or two uses of the following in all my code:

__call (usedata)
__add (userdata [1])
__sub (userdata [1])
__unm (userdata [1])
__eq (userdata)
__lt (userdata)
__le (userdata)
__concat (userdata)

But this is just my code. Take it with a grain of salt.

At the very least, consider keeping __mode (which is used in conjunction
with the garbage collector), and possibly making a cheap check for __index,
__newindex and __len (if possible).

> * No more coroutines

This is a deal-breaker for me (more on this below).

> So back to only one number type. As you can imagine this would mean
> add can simply be:
> a + b
>
> Now, I do not know how often people implement operator overloading -
> but given some languages do not have this feature it may not be a big
> loss.

Personally, almost never. But I do use LPeg (a lot) and that does use
operator overloading. And again, for me, this would be a big loss.

> I personally can do without coroutines too, I am not sure how widely
> they are used; but the trade off is that we eliminate a bunch of
> complexity from mini Lua.

I use them. Maybe not as often as LPeg, but in the context I use them,
they are invaluable. I use coroutines to implement servers. A network
request comes on, I create a coroutine to handle that request. A simple
echo service:

local signal = require "org.conman.signal"
local nfl = require "org.conman.nfl"
local tcp = require "org.conman.nfl.tcp"
local addr,port = ...

signal.catch('int')
signal.catch('term')

tcp.listen(addr or "0.0.0.0",port or 8000,function(ios)
repeat
local line = ios:read("*L") -- potential yield spot
ios:write(line) -- potential yield spot
until line == ""
ios:close()
end)

nfl.eventloop() -- this manages scheduling of coroutines

It makes the actual code to handle the connection straightforward instead
of chopping it up into an incomprehensible nest of callback functions.

There are a few other instances were coroutines are nice (such as the
Same Fringe Problem [3]) but for me, it's handling network services. I
think this would be the common use for coroutines today.

-spc

[1] One module and that one related to signals [2], used to add and
remove signals from a signal set (a POSIX thing).

[2] https://github.com/spc476/lua-conmanorg/blob/master/src/signal.c

[3] http://wiki.c2.com/?SameFringeProblem