lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 10/11/22, bil til <biltil52@gmail.com> wrote:
> This sounds really interesting.
>
> I usually hate such benchmarking concerning and interpreter language
> like Lua... . As I see it, the main focus for interpreter language
> should always be flexibility and the C "base code" must be designed
> such, that the "workhorse functions", e. g. with long numeric for
> loops must be handled by this C "base code".
>
> The benchmarks typically just test such "long numeric for loops" -
> which I think is a strange attitude of testing a Lua interpreter code.
>
> Therefore your test seems to show MUCH more, what REALLY is going on.
>
> But I could not find your detail explanation post in githup - can you
> give the exact link maybe?

I didn't give much detail about the program that I didn't repeat here.
I wrote it as a tangent in a thread dealing with a Pallene bug I'm
struggling with.
https://github.com/pallene-lang/pallene/discussions/547. Basically if
you scan those discussion forums for my posts over the last year
(there are not many users/posts there), including the different bug
report submissions with reproducible test cases, then you might infer
a lot more details about my program if you are deeply interested.

Overall, the program is a stock backtester, kind of like the Python
based BackTesting.py. But I wanted something that could be a lot
faster (BackTesting.py/Python is really slow...early on I
reimplemented a basic thing in Lua to compare, and even plain Lua is
like 50 times faster against Python's/BackTesting.py best idealized
usage case, which I expect to horribly devolve in typical real world
usage), and I needed something that supported options trading, which
meant implementing something new from scratch.

Pallene seemed like a good fit for this project/experiment because a
very large part of back testing is just doing math with arrays of
floats. But at the same time, backtesting strategies result in trying
a lot of crazy ideas, most of which you throw away, or modify heavily
to try something a little different if you get a result inspires a new
thought. So Pallene seemed to offer the ability of getting performance
because the data types  and usage patterns are well known and easily
compiler optimizable, while still being able to (mostly) benefit from
the simplicity and flexibility of writing in Lua (instead of C or some
other lower level language).

Also, it is known that creating C arrays as userdata and then
accessing individual elements in Lua has overhead and is not faster
than accessing individual elements of a native Lua array in Lua. So
writing a native C module for this case is not going to be faster
unless the C module also provides functions to do all the higher level
computations on the C side, allowing you to avoid individual element
access on the Lua side. Pallene is able to avoid this problem, so for
this usage case, it is clearly a superior solution to writing a C
module.

(And this is one of the major problems with Backtesting.py/Python.
They use Pandas everywhere which basically has the same FFI
performance problem, except worse. And all over real world code,
people are creating Pandas arrays and then implementing all their
strategies in regular Python, needing individual element access on
those arrays in Python, negating any benefits they might have gotten
from Pandas. The simple benchmark I mentioned earlier is exactly this
case...the idealized case I tested was able to use higher level Pandas
and avoid individual element access in Python for the most part. For a
non-trivial real world usage case, that is going to degrade.)


>
> I would be especially interested in this application you call "sys" -
> do you have an explanation why LuaJIT is so TERRIBLY bad here?
>


To clarify, "real", "user", and "sys" are categories reported by the
Unix 'time' utility to time a run. All those categories refer to a
singular program.
https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1

I presumed LuaJIT uses a lot more system calls thinking about how
things like mmap and mprotect probably work to make JIT possible. It's
not necessarily a bad thing, it's just how LuaJIT works.

There might also be a second contribution, because my port to LuaJIT
ended up making calls to os.date() and os.time(), which might also
possibly go into sys. In my original Pallene/Lua 5.4 implementation, I
used 64-bit integers to represent date+time, and I used some very
simple bit mask and shift tricks to make it easy/fast to convert a
date/time to an integer since I had plenty of bits to spare with 64.
Because LuaJIT doesn't have 64-bit integers, I ended up reimplementing
the date/time to use seconds since epoch, so in the end, I ended up
just calling os.date/os.time. For this particular benchmark, I don't
think I need to convert dates often, except on the initial
startup/loading of stock data, so I don't think it has a significant
impact on the overall run.