lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Archie Cobbs once stated:
> Hello,

  Hello.

> I have a couple of initial observations, plus an idea I'd like to
> throw out there... apologies in advance for the long post. Hopefully
> this has a logical flow to it.

  Don't worry---my reply is just as long, if not longer.  And that's with
snipping out parts I'm not directly replying to.

> The most profound restriction is the limitation to a single C call
> stack (you can't assume pthreads), and so Lua must longjmp() every
> time there is a coroutine context switch. In other words, the stack
> frames of any C functions in the call stack of a coroutine have to
> tossed when that coroutine is switched out.

  The Lua call stack and C call stack are two separate things.  The Lua VM
does not use longjmp() to implement coroutines [a][d].

[a]	Here I think you are conflating Lua coroutines with C coroutines. C
	doesn't have the concept of a coroutine, so to implement them, one
	must necessarily dive into unportable code.  Yes, there are
	implementations of C coroutines that use setjmp()/longjmp() but I
	think they rely upon undefined behavior.  The semantics of
	setjmp()/longjmp() is to provide a primitive way to dealing with
	exceptions (returning to a higher point in the call stack) than with
	coroutines.  See [b] for details.

	I think the reason people think that Lua coroutines are handled via
	setjmp()/longjmp() is due to an inability to yield a Lua coroutine
	over a C call (Lua->C->Lua->yield).  Lua 5.2 fixed some issues with
	that (in that 5.2+ can handle some cases where 5.1 can't) [c].

[b]	https://stackoverflow.com/questions/14685406/practical-usage-of-setjmp-and-longjmp-in-c

[c]	I have some code that can yield properly in Lua 5.2 that will cause
	an error in 5.1.

[d]	I'm switching up how I do foot notes because this is a long message.

> For "plain" Lua, restriction to ANSI C creates these problems:
> 
>     Problem #1. Lua is single-threaded and can't take advantage of
> multi-core machines

  Yes it can.  Not out of the box certainly, but if you define lua_lock()
and lua_unlock() (and recompile Lua) [a] you can then share a Lua state
across operating system threads.  The downside is that Lua goes from being
one of the fastest scripting langauges down to about the execution speed of
Python.

[a]	I wonder ... on some platforms, you could probably get away with
	defining lua_lock() and lua_unlock() as actual functions that do
	nothing, then use LD_PRELOAD to load a different implementation of
	lua_lock()/lua_unlock at runtime.  Then again, you would need to
	compile Lua as a shared object and do some other linking magic. 
	Just a thought.

>     Problem #2. If Lua invokes any blocking C function, then the
> entire VM blocks (all coroutines). In other words, coroutine blocking
> is serialized.

  [ snip ]

> Problem #1 speaks for itself and is easy to understand. You can
> address this by creating multiple Lua contexts, but then you have to
> handle the resulting isolation: there are no shared globals or
> upvalues, and so all communication between contexts requires
> serialization of some kind.

  Or implement lua_lock()/lua_unlock() as mentioned.  

  Also, the method you describe is a message based, shared nothing approach
and it works well.  Erlang does nothing but message based, shared nothing
and it's used in the telecom industry.  It is easy to reason about and quite
simple to use.  It does *not* have to be slow [a].

[a]	Back in the mid-90s, I was friends with the owners of a small
	software company in Ft. Lauderdale who wrote commercial X Window
	servers for a variety of operating systems.  Their *fastest* version
	was for QNX, which was a message based, shared nothing operating
	system.  In fact, the x86 kernel was about 8K (yes, 8,192 bytes) in
	size, with most services traditionally served by a kernel by user
	processes.

> As for Problem #3, it's more of an inconvenience than a major problem.
> 
> Let's talk about Problem #2!
> 
> In my view, Problem #2 is the most serious problem. It eliminates a
> huge class of applications, namely, any application supporting
> simultaneous blocking operations. For example, a web server!  

  It does not.  At work, I have a program written in Lua that is handling
over 200,000,000 calls per day [a].  I even have a gopher server [b][c]
written in Lua, and with that, I don't see writing a webserver as much of an
issue.  Granted, in the case of the 200,000,000 phone calls, it's not just a
single process on a single box, but multiple (independent) processes on
multiple boxes dealing with the traffic, but it's probably running fewer
instances than you would expect to handle such traffic. [e]

[a]	Literally.  As in phone calls.  The product our company makes is
	part of the call path for a major cell phone company and as such,
	there is Lua code executed when a person makes a cell phone call.

[b]	https://en.wikipedia.org/wiki/Gopher_%28protocol%29

[c]	gopher://gopher.conman.org/ [d]

[d]	Source code:  https://github.com/spc476/port70

[e]	If I think too much about it, I want to throw up but that's besides
	the issue.

> I'm
> assuming there that a web server that would serialize every blocking
> I/O call - which is all you can do in plain Lua - does not count as a
> real web server :)

  One *could* in theory use plain Lua to serve up a web page, you just write
a simple program in plain Lua that is invoked via inetd [a] but that's
pretty slow.  Besides, plain Lua can't even do networking out of the box.

[a]	A program that can execute a program when a request comes in to a
	certain network port.  It was how some services were done in the 80s
	and early 90s.
 
> For many applications (e.g., web servers) this is too restrictive, and
> so people become willing to compromise on Advantage #1 (i.e.,
> sacrifice portability) in order to get a solution to Problem #2.

  From what I understand, Cqueues will run on POSIX (multiple Unix like
operating systems like Linux, Solaris, Mac OS-X, BSDs) and Windows, so from
a modern perspective, it can be "portable".

> I'm still trying to sort through all the ways people have tried to
> address Problem #2.
> 
> But let's take the "luv" module (event library based on non-blocking
> file descriptors) as an exmaple of a reasonable and popular solution.
> 
> OK so far so good...  BUT - there's still a larger problem with "luv"
> or any other solution to Problem #2 I've run across so far: none of
> them are COMPOSABLE with other, unrelated Lua modules.

  True, but you have that issue in any language.  Story time.

  Back around 2011, there were two projects being worked on that needed to
make DNS queries [a][b].  At the time, both projects where attemping to use
C-Ares library to do the DNS queries.  Problem one:  it didn't support the
DNS record type so code needed to be added to parse those.  Problem two:  it
wanted to handle *all* the networking releated to DNS queries, which was
problematic because both projects were already dealing with the network, so
both were stuck trying to integrate this monsterous DNS networking library
into their existing program [c].

  And here I was, looking into DNS on my own at the time.  I wasn't
interested in the networking side (boring!) but instead on how to parse the
DNS packets, and as a result, wrote code that just encoded and decoded DNS
packets [d].  When the other two teams saw that code, they had it
integreated into their respective projects in a few hours and it's worked
flawlessly since.

  Oh, and both projects were in C++ by the way, not Lua.

  So to sum up, you have the composability issue regardless of langauge.

[a]	In order to look up name information for phone numbers.  I'm
	serious, the telecom industry uses DNS to convert a phone number
	like 561-555-1212 into a name like "John Doe" or "ACME Inc.".

[b]	I'm liking this new footnote style.  I'm up to what? 20 footnotes so
	far?

[c]	In defense of C-Ares, I think the creators of that project thought
	they should deal with all the complexity of dealing with DNS,
	including multiple servers, TTL managmenet and retries to releave
	users of their project all those details.  The problem was---all the
	*other* DNS libraries out there did the same.

[d]	https://github.com/spc476/SPCDNS

> Here's what I mean: take "luv" for example. It only works if all file
> descriptors are set to non-blocking mode. Now suppose you pull some
> random mysql module off the shelf that communicates over a network
> socket to your SQL server. And suppose that mysql library uses
> LuaSocket to create a network connection to the database.
> 
> Unless that mysql module is a priori designed to work with "luv", as
> soon as you start using it, it will make some blocking call, and your
> entire application will block - oops, here comes Problem #2 again.

  As mentioned above, this is an issue in any language.  If you need a
library to do X, and it also deals with Y, which your program is already
dealing with, there's a problem.  And the solutions are limited:

	1) Find another library do to X.

	2) Write code to do X.

	3) Adopt librray A to work with your Y.

	4) Change your Y to library A's Y.

	5) Go shopping because this stuff is hard.

> Even if you could somehow access the socket that the mysql module uses
> and set it to non-blocking mode, that still wouldn't work because the
> mysql module wouldn't know what to do with the  new EWOULDBLOCK error
> code it would get back.
> 
> In other words, there exist "solutions" to Problem #2, but if you use
> them, then EVERY other module you use that makes blocking calls needs
> to also be designed to work with that solution, or else you haven't
> really solved the problem.
> 
> Another example: suppose you want your application to fetch an HTTPS
> URL like "https://foobar.com/"; without blocking the entire VM, using
> some simple "request" object API. Then you have to find a module that
> contains ALL THREE of (a) non-blocking I/O, (b) HTTP client support,
> and (c) SSL support... or else you have to find three separate modules
> for (a), (b), and (c) that were all written to work together
> (unlikely).

  Well, while I don't have a module for (b), I do have separate modules for
[a] and [c].  You can use the TLS module without the network module [d] but
you can use the network module if you want to.  In fact, I think you can use
my TLS library with LuaSocket if you really tried to [e].  It was a design
goal of mine to make multiple small libraries that work well together [f]
instead of large all-encompassing libraries.  I *like* composability.

[a]	https://github.com/spc476/lua-conmanorg/blob/master/src/net.c

[b]	There is no b, only Zuul.

[c]	https://github.com/spc476/lua-conmanorg/blob/master/src/tls.c

[d]	Because the underlying library I wrapped also does networking, sigh.

[e]	I haven't yet.

[f]	https://github.com/spc476/LPeg-Parsers
	https://github.com/spc476/lua-conmanorg
	https://github.com/spc476/CBOR
	https://github.com/spc476/SPCDNS

> In my opinion this problem is actually the biggest problem with the
> Lua "ecosystem". As evidence look at the discussion going on in the
> "batteries" thread right now.

  That's a good summary I think.

> PROPOSAL
> 
> I tried to think about whether there is a way to address Problem #2
> directly and transparently, i.e., in a way that doesn't require other
> modules to know anything about it. So that, using the above example,
> if you started using some random mysql module that makes blocking
> network calls, it wouldn't lock up the entire VM while it waits.
> 
> How could you do this? Here was my first idea:
> 
>     1. Give each coroutine it's own pthread and C stack.
>     2. Disable preemption (i.e., context switch only on blocking call
> or yield())
>     3. Disable parallel execution to preserve Advantage #2 (e.g., lock
> all pthreads to one CPU)
> 
> The key here is that the combination of #2 and #3 means only one
> coroutine runs at a time, and it runs without being preempted until it
> either (a) yields, or (b) blocks. In effect, a blocking call is
> treated like a yield(), with the corresponding resume() occurring when
> the file descriptor becomes readable (or whatever).

  I think that's how most network event drivers for Lua currently work, only
yield on blocking calls and otherwise let the currently running coroutine do
it's work.  It's how my own network driver [a] works.

[a]	https://github.com/spc476/lua-conmanorg/blob/master/lua/nfl.lua


> This type of solution preserves Advantage #2 (and provides a simple
> solution to Problem #3) with no additional locking. There is only one
> small downside compared to "plain" Lua: blocking calls now behave as
> if they could yield() internally. For example, this code could behave
> differently:
> 
>     counter = 1;
>     function mycoroutine(client)
>         print(counter)
>         print(counter)     // counter must be the same here
>         client:receive()   // this is a blocking call
>         print(counter)     // counter could have changed here!
>     end
> 
> In plain Lua the counter could not have changed in the last print() statement.

  This issue exists in other langauges as well.  This isn't just a Lua
problem.

> Since we can't do it with pthreads, what about GNU pth? It is
> specifically designed for non-preemptive multitasking and gives you
> 3(a), 3(b), and 3(c) by design. Moreover, pth is very portable and
> stable (I used it back in the 90's).

  At least it's LGPL, but how well does it work with pthreads?  Because I
have this module over here that uses pthreads ... 

> (If people barf on the GPL, it would be easy to write the same thing
> from scratch under a looser license.)
> 
> What would be required of existing C modules to support this? => Any
> blocking calls would have to be changed so they invoke pth's version.
> For example, read becomes pth_read(), sleep() becomes pth_sleep(),
> etc.

  On some systems, one can use LD_PRELOAD tricks to swap out blocking
functions with ones that can yield.  

> Does this mean C modules would end up being littered with #ifdef's and
> incompatible with "plain" Lua?
> 
> No. Instead, this addition to the Lua API would define a Lua wrapper
> funtion for each blocking function.

  What functions would those be?  Because there's this other thread trying
to define a minimal set of "batteries" for Lua ... 

> So the C module would change read() into lua_read(), sleep() into
> lua_sleep(), etc., after including some appropriate Lua header.
> 
> But these wrapper functions could work either way: the same C module
> would compile and work on either "plain" Lua or on this new, optional
> "nonblock" version of Lua: on platforms supporting "nonblock" the
> wrappers would redirect to non-blocking versions; on "plain" Lua
> platforms, the wrappers would just fall back to the original versions
> - so you get exactly what you did before.

  Ah, much like how lua_lock() and lua_unlock() work today.

> To summarize this idea:
> 
>   1. Add new, optional "nonblock" support to Lua on platforms that can
> handle it, based on cooperative threading (GNU pth or equivalent)

  Yes, there's another thread about that ... 

>   2. There is a new header file #include "lua_nonblock.h" defining
> wrappers for blocking C calls: lua_read(), lua_write(), lua_sleep(),
> etc.
>   3. C modules written to support "nonblock" continue to work when
> compiled with "plain" Lua, but now can also context switch if compiled
> on "nonblock" versions of Lua
>   4. coroutine.create() gets a new optional parameter, where true
> means "use a separate C stack (if supported)"
> 
> To me this would be a major improvement in the status quo and improve
> the Lua ecosystem dramatically by allowing all of the modules out
> there to be composable.

  Again, the non-composable problem isn't limited to just Lua.

> The only cost to module developers would be match & replace read() ->
> lua_read(), etc.

  None of my modules call read(), not even my network and file system
modules.  There are some that call fread(), but that's a C call, and on most
POSIX systems today, you can't use a file with select() [a] (any file
descriptions pointing to real files on the disk will always return ready for
both reading and writing).  

[a]	or poll(), epoll(), kqueue(), etc.

> Thoughts?

  Lots, but this is long enough as it ...

  -spc