lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

It was thus said that the Great Viacheslav Usov once stated:
> On Wed, Dec 27, 2017 at 10:20 AM, Javier Guerra Giraldez <
> > wrote:
> >  if that was a reference to defining those macros that essentially add
> lock operations to every mutation, it does feel about right since in
> essence it adds a Global Interpreter Lock, the dreaded "GIL problem" that
> made Python folks essentially abandon threading in favor of communicating
> processes.
> I mean a reference to a test, with a full disclosure of the methods, whose
> results would indicate that something "slowed Lua down to Python speeds".
> Note that even the formulation of the statement is dubious, because
> apparently a non-MT-safe version of Lua is compared with an MT-safe version
> of Lua, which can only reasonably be done with a single-threaded test, so
> what is being compared to what and why is that relevant? Secondly, the
> formulation suggests that "stock" Lua is significantly "faster" than
> Python, while there are some results, such as [1], that show that the
> situation is more complicated than that.

  I seriously question the methods used for the Python programs.  In each
case where Python won (with the exception of mandelbrot), it included a
multiprocessing module, whereas Lua was just the stock Lua, so it's not a
fair "apples-to-apples" comparison (in this case, Python's "batteries
included" gives it a boost with respect to Lua's "batteries not included."

  As for what is being measured, it's the overhead of the "global
interpreter lock" (a concept in Python, not in Lua) to make Lua thread safe,
and such locks do take time.  

  Be that as it mays, I was only able to run one test before the machine I
was testing on shutdown due to thermal issues [2] but it is illustrative of
the issue.  Anyway, the machine I tested on was an Unbuntu Linux 64-bit with
quad core.  The Lua version is 5.3 with all bugs patched [3].  The first
test is with the stock definition of lua_lock() and lua_unlock() (basically,
no implementation):

[spc]saltmine-2:/tmp/bm>time lua-53 fasta.lua-2.lua 25000000 >output-fasta 

real    0m32.409s
user    0m31.970s
sys     0m0.304s

  This is faster than the results on the benchmark site:

source  secs    mem     gz      cpu     cpu load
Lua     50.08   2,920   1061    50.06   0% 7% 94% 0%

but in reading about the website tests [6], they sample the program for
memory usage every 0.2 seconds, which I'm not doing, so that *may* explain
the difference.  I don't know.  The version of Ubuntu wasn't specified, nor
the actual x86-64 quadcore chip, so that may also have an effect on the

  I digress.  For the next test, I modified the Lua-5.3 source code to
include locking:

diff --git a/src/llimits.h b/src/llimits.h
index f21377f..0044e12 100644
--- a/src/llimits.h
+++ b/src/llimits.h
@@ -211,8 +211,8 @@ typedef unsigned long Instruction;
 ** ('lua_lock') and leaves the core ('lua_unlock')
 #if !defined(lua_lock)
-#define lua_lock(L)    ((void) 0)
-#define lua_unlock(L)  ((void) 0)
+#define lua_lock(L)    pthread_mutex_lock(&(L)->l_G->lock)
+#define lua_unlock(L)  pthread_mutex_unlock(&(L)->l_G->lock)
diff --git a/src/lstate.c b/src/lstate.c
index 9194ac3..dfbe88f 100644
--- a/src/lstate.c
+++ b/src/lstate.c
@@ -12,7 +12,7 @@
 #include <stddef.h>
 #include <string.h>
+#include <pthread.h>
 #include "lua.h"
 #include "lapi.h"
@@ -328,6 +328,7 @@ LUA_API lua_State *lua_newstate (lua_Alloc f, void *ud) {
   g->gcfinnum = 0;
   g->gcpause = LUAI_GCPAUSE;
   g->gcstepmul = LUAI_GCMUL;
+  pthread_mutex_init(&g->lock,NULL);
   for (i=0; i < LUA_NUMTAGS; i++) g->mt[i] = NULL;
   if (luaD_rawrunprotected(L, f_luaopen, NULL) != LUA_OK) {
     /* memory allocation error: free partial state */
diff --git a/src/lstate.h b/src/lstate.h
index a469466..5677e6e 100644
--- a/src/lstate.h
+++ b/src/lstate.h
@@ -112,6 +112,7 @@ typedef struct CallInfo {
 #define setoah(st,v)   ((st) = ((st) & ~CIST_OAH) | (v))
 #define getoah(st)     ((st) & CIST_OAH)
+#include <pthread.h>
 ** 'global state', shared by all threads of this state
@@ -151,6 +152,7 @@ typedef struct global_State {
   TString *tmname[TM_N];  /* array with tag-method names */
   struct Table *mt[LUA_NUMTAGS];  /* metatables for basic types */
   TString *strcache[STRCACHE_N][STRCACHE_M];  /* cache for strings in API */
+  pthread_mutex_t lock;
 } global_State;
I figure this was the minimum required for the job.  What that done, I
compiled the custom version of Lua and reran the test:

[spc]saltmine-2:/tmp/bm>time ./lua-5.3/src/lua fasta.lua-2.lua 25000000 >output-fasta

real    0m36.767s
user    0m36.010s
sys     0m0.320s

  Definitely slower.  And the ratio between these results (32.4 / 36.8 or
.88) is close enough to the benchmark results (50.0 / 59.5 or .84) seems to
indicate that yes, the modified version may be a slow as Python [4].

  I did run the n-body test [5] for a baseline and got a result of 283.5
seconds.  I was running the modified Lua version when the CPU overheated and
shutdown (and I lost connection to the box).  I do not know how long it ran
before shutting down.

  -spc (Cheers!)

> [1]

[2]	A Linux laptop at work---it matches the specs given on the website
	[1] so that's why I was using it.  I won't be able to finish this
	until I get back to the office after the New Year.

	Yes, that laptop is a bit temperamental.  Sigh.


[4]	I'm did not install Python3 as I was trying to get results for Lua.
	Also, I'm on vacation.