lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I've just started seeing a similar weird bug. My
program is pure Lua, via lua.exe, no added third-party
libraries or C code. If I run it repeatedly with the
same data, on rare occasions I get errors that seem to
be related to my input data getting scrambled or cut
off somehow.

This is very disturbing because Lua is the scripting
language used in the core of our new product that is
currently under development. Up until a few days ago,
it seemed solid as a rock.

I'll see if I can get permission to send a complete
system to one of the main developers, assuming one of
them is interested in trying to figure it out.

--- Luiz Henrique de Figueiredo
<lhf@tecgraf.puc-rio.br> wrote:

> Zack Weinberg, of the Monotone team, has asked me to
> post this here.
> Please reply with Cc to him at <zackw@panix.com>.
> --lhf
> 
>  From: "Zack Weinberg" <zackw@panix.com>
> 
>  Hi, I'm one of the developers of the Monotone
> version control system
>  (http://monotone.ca/) We use Lua both in the
> application itself and in
>  its test harness.  Recently I changed the test
> harness to make it
>  parallelizable; this works great except for bizarre
> intermittent
>  problems with I/O on some, but not all, Unix-family
> operating systems.
>   The symptoms are suspiciously similar to the ones
> discussed in the
>  thread starting at
> 
>
http://lua-users.org/lists/lua-l/2007-04/msg00386.html
> ("loadfile gets
>  stdin confused") but I don't think it's exactly the
> same bug.
> 
>  The test driver program is written in a mixture of
> C++ and Lua.  The
>  C++ main() creates a Lua interpreter structure and
> loads a bunch of
>  C++ extensions and Lua definitions -- the latter
> have been embedded
>  into the C++ executable, and are read in with
> luaL_loadbuffer().  It
>  then uses luaL_loadfile() to evaluate a "testsuite
> definition file",
>  specified on the command line.  This file can
> define more Lua
>  functions for the test suite's use; it also tells
> the driver where to
>  find a directory containing test cases.  Test cases
> are subdirectories
>  of that directory containing a Lua script with a
> particular name.
> 
>  The driver creates a directory to run the test
> cases in, and creates
>  (with io.open()) a "master logfile" in that
> directory.  For each test
>  case, it creates a subdirectory, and fork()s a
> child process.  The
>  child process chdir()s into the subdirectory and
> opens a "per-test
>  logfile", again with io.open().  It then runs the
> testcase script,
>  with loadfile() and xpcall() at the Lua level. 
> When the test case
>  script completes, the calling function calls
> f:close() on the per-test
>  logfile, then io.open()s a "status file" into which
> it writes one of
>  several short strings that describe the overall
> result of the test.
>  [We can't use the process exit code for this,
> unfortunately; it
>  doesn't give us enough bits.]  The child process
> then terminates.  The
>  parent process reads the overall result out of the
> status file and
>  writes it to the master log file and to the
> original stdout.
> 
>  The above is how it's *supposed* to work.  The bug
> is that
>  intermittently (and not on all supported platforms,
> and of course
>  *never* under the debugger) chunks of text which
> were supposed to go
>  to the per-test logfile either fail to show up
> anywhere, or show up in
>  the status file instead.
> 
>  The child processes never touch stdin/out/err; in
> fact, I deny any
>  access to stdin/out/err to all code written in Lua
> (by removing almost
>  everything from the io table).  The child processes
> never write to the
>  master logfile, either.  I do not replace
> stdin/out/err at any point
>  in the code, nor do I mess with file descriptors 0,
> 1, or 2.
>  (Previous incarnations of the code did mess with
> the file descriptors,
>  but taking that out did not make the bug go away.) 
> Iostreams are not
>  used anywhere.  The only remaining "dirty trick",
> and I confess I
>  don't see how it could be causing the problem here,
> is that the Lua
>  interpreter is created and initialized once, in the
> parent process.  I
>  rely on fork() to clone its state into the
> children, and I do not
>  lua_close() the interpreter in the children.  The
> files that the
>  children write are explicitly closed instead of
> relying on final GC to
>  do it.
> 
>  [Monotone does work on Windows, but of necessity
> the test suite must
>  be parallelized rather differently there, and the
> problem has not been
>  reported there.]
> 
>  Any help would be greatly appreciated.  If anyone
> wants to look at the
>  code, the relevant files are tester.cc,
> testlib.lua, and
>  unix/tester-plaf.cc in the current monotone
> development repository
>  (alas, I cannot point you at a tarball).  I regret
> not being able to
>  provide a small self-contained testcase.
> 
>  zw
> 
>