lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi!

I was playing around with Lua a bit lately and noticed that it supported loading precompiled bytecode at runtime through `load` [1], which caught my attention. Looking at the source code, it seemed like no validation was performed on the input (actually, I could have just read the documentation for `load` it seems :P). I was able to confirm this with a quick run of the afl fuzzer [2], which found hundreds of crashes within a few minutes. After some digging around I found some previous threads ([3], [4]), suggesting that this was a fairly well known issue. Also, for example redis has this specific functionality disabled [5]. On the other hand, quoting from [6]: "We have always considered it unacceptable for a Lua program to be able to crash the host application. Lua should be a safe language.". This seems to be clearly violated here.

The crashes mostly seem to have two distinct causes:

    1. The bytecode parser does not validate its input

    2. The Lua VM itself considers the bytecode trusted

Fixing the issues from the first category should be possible without too much overhead. I've already started doing so, see below.

The easiest way to fix the issues from the second category is probably to validate the bytecode at load time by performing some simple checks on every instruction (i.e. check that JUMPs, LOADKs and register accesses are within the respective bounds, and make sure the code ends in a RETURN). This way there is no overhead at execution time.

Due to its simplicity, the Lua interpreter in general seems well suited to be exposed to potentially untrusted input. I think it would thus be a good idea to provide a short guide on how to configure the interpreter to safely use it in such a scenario. This would obviously include disabling (some functions from) the `io` and `os` modules, and disabling bytecode loading (for now). It could also contain a word or two about sandboxing the interpreter. As an example where this currently seems to go wrong consider https://www.lua.org/cgi-bin/demo, which is clearly exposed to untrusted input, but apparently hasn't disabled bytecode loading through `load` (I've also informed the website owners in a separate email).

It should be noted that it is actually possible to compromise (and not only crash) the host application if malicious bytecode is executed by the interpreter. I've written a small proof-of-concept exploit which manages to execute arbitrary shell commands, even though `os.execute` has been disabled in the interpreter. I can provide some more details when this problem has been fixed on lua.org. Exploitation is simplified by the fact that Lua provides addresses of objects through `tostring`, thus allowing for an easy way to bypass ASLR [7].

Below are some quick patches for all bugs in the parser itself that afl has found so far (i.e. bugs that cause a crash when just doing `load` without executing the resulting chunk afterwards):


    diff --git a/src/lundump.c b/src/lundump.c
    index 4080af9..c963535 100644
    --- a/src/lundump.c
    +++ b/src/lundump.c
    @@ -194,6 +194,8 @@ static void LoadDebug (LoadState *S, Proto *f) {
         f->locvars[i].endpc = LoadInt(S);
       }
       n = LoadInt(S);
    +  if (n > f->sizeupvalues)
    +      error(S, "Number of upvalues mismatch");
       for (i = 0; i < n; i++)
         f->upvalues[i].name = LoadString(S);
     }

This fixes an out-of-bounds write if the number of upvalues in the debug section of the file exceeds the number of previously loaded upvalues:


    diff --git a/src/lundump.c b/src/lundump.c
    index c963535..d0e63e6 100644
    --- a/src/lundump.c
    +++ b/src/lundump.c
    @@ -84,6 +84,10 @@ static lua_Integer LoadInteger (LoadState *S) {
       return x;
     }

    +// Taken from lstrlib.c TODO don't duplicate code here
    +#define MAX_SIZET    ((size_t)(~(size_t)0))
    +#define MAXSIZE  \
    +    (sizeof(size_t) < sizeof(int) ? MAX_SIZET : (size_t)(INT_MAX))

     static TString *LoadString (LoadState *S) {
       size_t size = LoadByte(S);
    @@ -91,6 +95,8 @@ static TString *LoadString (LoadState *S) {
         LoadVar(S, size);
       if (size == 0)
         return NULL;
    +  if (size > MAXSIZE)
    +    error(S, "string too large");
       else if (--size <= LUAI_MAXSHORTLEN) {  /* short string? */
         char buff[LUAI_MAXSHORTLEN];
         LoadVector(S, buff, size);

This limits the size of strings in the same way str_rep does it. This is likely still incomplete since as far as I can tell (string) allocations may still fail and return NULL, in which case a segfault will follow. This seems to be a more general issue though.


    diff --git a/src/lundump.c b/src/lundump.c
    index c963535..ba19563 100644
    --- a/src/lundump.c
    +++ b/src/lundump.c
    @@ -90,7 +90,7 @@ static TString *LoadString (LoadState *S) {
       if (size == 0xFF)
         LoadVar(S, size);
       if (size == 0)
    -    return NULL;
    +    error(S, "Invalid string");
       else if (--size <= LUAI_MAXSHORTLEN) {  /* short string? */
         char buff[LUAI_MAXSHORTLEN];
         LoadVector(S, buff, size);

This prevents a crash when LoadString returns NULL. NULL strings are supported by ldump.c (I'm not quite sure why though), but without this patch they'll cause a crash in LoadConstants.

There are some more issues that are more or less parser related, for example, the interpreter will crash if the number of arguments that a function expects is negative. These issues can likely be found by fuzzing and can probably also be fixed in the parser by limiting certain size values.

Looking forward to your opinions!

Best,
Samuel Groß


[1] https://www.lua.org/manual/5.3/manual.html#pdf-load
[2] http://lcamtuf.coredump.cx/afl/
[3] http://lua-users.org/lists/lua-l/2008-03/msg00430.html
[4] http://lua-users.org/lists/lua-l/2015-07/msg00397.html
[5] https://github.com/antirez/redis/commit/fdf9d455098f54f7666c702ae464e6ea21e25411
[6] https://www.lua.org/doc/hopl.pdf
[7] https://en.wikipedia.org/wiki/Address_space_layout_randomization