lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Zack Weinberg, of the Monotone team, has asked me to post this here.
Please reply with Cc to him at <zackw@panix.com>.
--lhf

 From: "Zack Weinberg" <zackw@panix.com>

 Hi, I'm one of the developers of the Monotone version control system
 (http://monotone.ca/) We use Lua both in the application itself and in
 its test harness.  Recently I changed the test harness to make it
 parallelizable; this works great except for bizarre intermittent
 problems with I/O on some, but not all, Unix-family operating systems.
  The symptoms are suspiciously similar to the ones discussed in the
 thread starting at
 http://lua-users.org/lists/lua-l/2007-04/msg00386.html ("loadfile gets
 stdin confused") but I don't think it's exactly the same bug.

 The test driver program is written in a mixture of C++ and Lua.  The
 C++ main() creates a Lua interpreter structure and loads a bunch of
 C++ extensions and Lua definitions -- the latter have been embedded
 into the C++ executable, and are read in with luaL_loadbuffer().  It
 then uses luaL_loadfile() to evaluate a "testsuite definition file",
 specified on the command line.  This file can define more Lua
 functions for the test suite's use; it also tells the driver where to
 find a directory containing test cases.  Test cases are subdirectories
 of that directory containing a Lua script with a particular name.

 The driver creates a directory to run the test cases in, and creates
 (with io.open()) a "master logfile" in that directory.  For each test
 case, it creates a subdirectory, and fork()s a child process.  The
 child process chdir()s into the subdirectory and opens a "per-test
 logfile", again with io.open().  It then runs the testcase script,
 with loadfile() and xpcall() at the Lua level.  When the test case
 script completes, the calling function calls f:close() on the per-test
 logfile, then io.open()s a "status file" into which it writes one of
 several short strings that describe the overall result of the test.
 [We can't use the process exit code for this, unfortunately; it
 doesn't give us enough bits.]  The child process then terminates.  The
 parent process reads the overall result out of the status file and
 writes it to the master log file and to the original stdout.

 The above is how it's *supposed* to work.  The bug is that
 intermittently (and not on all supported platforms, and of course
 *never* under the debugger) chunks of text which were supposed to go
 to the per-test logfile either fail to show up anywhere, or show up in
 the status file instead.

 The child processes never touch stdin/out/err; in fact, I deny any
 access to stdin/out/err to all code written in Lua (by removing almost
 everything from the io table).  The child processes never write to the
 master logfile, either.  I do not replace stdin/out/err at any point
 in the code, nor do I mess with file descriptors 0, 1, or 2.
 (Previous incarnations of the code did mess with the file descriptors,
 but taking that out did not make the bug go away.)  Iostreams are not
 used anywhere.  The only remaining "dirty trick", and I confess I
 don't see how it could be causing the problem here, is that the Lua
 interpreter is created and initialized once, in the parent process.  I
 rely on fork() to clone its state into the children, and I do not
 lua_close() the interpreter in the children.  The files that the
 children write are explicitly closed instead of relying on final GC to
 do it.

 [Monotone does work on Windows, but of necessity the test suite must
 be parallelized rather differently there, and the problem has not been
 reported there.]

 Any help would be greatly appreciated.  If anyone wants to look at the
 code, the relevant files are tester.cc, testlib.lua, and
 unix/tester-plaf.cc in the current monotone development repository
 (alas, I cannot point you at a tarball).  I regret not being able to
 provide a small self-contained testcase.

 zw