lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hello,

I've conducted an experiment to speed up Lua state creation with static
hash tables generated by GNU gperf and got some interesting results.

Introduction
============

Lua states offer a very light-weight way to execute independent scripts,
which is a much desirable feature for programs that execute a large
number of them (e.g. web servers).

Unfortunately the standard library is too small for many tasks and
adding modules by hand is quite a hassle. Furthermore are dynamic
libraries a platform-dependent mess.

The goal of this experiment was to find a way to add many more functions
to Lua a) without using dynamic loading and b) without slowing down the
creation of new states.

luaL_openlibs loads all functions, tables and values such as print,
string, _VERSION, math.pi etc. that make up the standard library into
the Lua state so that the script can access them via table lookups.

But rarely does a script use ALL of them and the more functions get
added, the more unnecessary work luaL_openli has to do.
So, the less unused Lua values get loaded into RAM, the better.

OK, so what if we don't actually load them and just set a metatable
with a __index metamethod that fetches the values as the script needs
them? The script won't notice absent values it doesn't use -- Great!

...

But wait, where does the __index metamethod get the values from?

A static hash table that just sits idle in the code section!

How does it work?
=================

To make a static hash table, all keywords that map to Lua values
need to be known at compile time. Therefore every module needs
a description file (module.lua) that contains all keys of the module.
The modules are grouped together in a directory (lx/[library name])
and make up a library.

A script (lx/makelib.lua) reads each module's description (module.lua)
and uses gperf to generate a static hash table that maps all keys to
Lua values (or other modules). Then the C code of the module (module.c),
the static hash table and an exposed entry point (struct lx_module)
are concatenated together into a single file in the output directory
(generatedlib).

A C program can use a version of the Lua core without libraries
(lua-5.4.0-beta-nolibs) to create Lua states and then add the libraries
by setting a metatable for the global table with:
lx_set_lookup_metatable(L, lx_[base module name]);

There are three different LX libraries that can be selected in the
Makefile:
 * base contains only the Lua 5.4 standard libraries
 * basex additionally contains LuaFileSystem and some LHF libraries
   (complex, base64, ascii85, mathx)
 * heavy gets auto-generated by generate_heavy.lua and contains 200
   modules with 500 functions each. This is to test how it behaves
   with a large amount of ballast. (Warning GCC chews a while on it)

What are the results?
=====================

All files can be found here: https://github.com/evelance/lxlib

(I tested it on a laptop with Linux in a virtual machine, so the results
aren't terribly accurate given how hard it is to get reproducible
measurements on modern CPUs with caching, thermal throttling etc - so
just run them yourself, they only need a Linux with gperf installed.)

Usual time needed to perform following operations on 100K Lua states:
Operation          ~time in seconds
luaL_newstate:           1
luaL_openlibs:          10
Load bytecode:           0.5 (small script, loop and coroutine)
Execute said script:     7
Full GC cycle:           0.8
Require single module:  40
lx_set_lookup_metatable: 0.1
lua_close:               1.5

Observations
============

 * luaL_openlibs takes significant part of the total time
   for small scripts
 * require is really, really slow
 * The execution time of the script with lx_set_lookup_metatable
   is slightly longer since missing values need to be fetched
   using a CFunction
 * lua_states with LX need less RAM than luaL_openlibs (8KB vs 22KB)
   100K states with bytecode executed + coroutine need ca. 1GB vs 2.5GB
 * With LX the state setup time remains constant, even with 100.000
   Lua values
 * The many additional hash functions slightly increase the code size
   (298KB vs 285KB)
 * The startup time of Lua as a CLI remains basically 0 even with 40MB
   of bloat caused by the 100.000 functions of lx/heavy
 * LX modules are only initialized after they have been accessed, this
   means that the string metamethods only work after string has been
   accessed
 * Is it a good idea to set the global table's metatable?
 * There are some other quirks...

Conclusion
==========

With static hashing it is possible to add hundreds of
thousands of functions to Lua that are accessible for all
scripts/states without slowing it down at all.
It decreases the memory usage of individual scripts/states but
slightly increases the executable size.
It slows down scripts that really use most functions available but
decreases the strain on the garbage collector a bit for those that
don't.
The generated code is fully portable and does not depend on ldl.

The code generation that is neccessary anyways could be upgraded to
a script that easily builds a custom Lua with modules "à la carte".

All feedback (or more test results) welcome :)