Re: tostring userdata

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: tostring userdata
From: Sean Conner <sean@...>
Date: Sun, 7 Jul 2019 18:17:32 -0400

It was thus said that the Great Coda Highland once stated:
> On Sun, Jul 7, 2019 at 9:12 AM Roberto Ierusalimschy <roberto@inf.puc-rio.br>
> wrote:
> 
> > > Is ASLR worth it or just some useless obfuscation we don't need to care
> > > about? I can't judge.
> >
> > I may be completely wrong here, but as far as I know the main motivation
> > for ASLR were attacks like stack overflow in C. As far as I know, in
> > C it is trivial (actually a non-op) to take the address of anything:
> > functions, data structures, the stack, etc. So, if taking addresses were
> > really that big problem, ASLR would be dead before starting.
> 
> This isn't necessarily true.
> 
> I mean, if you're compiling and executing arbitrary untrusted C code,
> you're already doing something wrong to begin with. 

  Technically speaking (the best kind of correct 8-) we do this *all* the
time.  Did you audit the code for Firefox?  Or Chrome?  No, you probably
downloaded the program from the official location and ran it with little
thought, under the assumption that the code won't do anything nefarious to
your computer or data.

  And even *if* you download the code to a program, it's statistically
unlikely anyone here audited the code.  Sure, it *can* be done, but the
Heartbleed exploit in OpenSSL pretty much showed that isn't necessarily the
case for even a critical, security-based piece of software the entire
Internet uses.

> But disregarding that,
> even C code shouldn't be able to arbitrarily snoop on the memory space of
> OTHER processes. In a secure system, you wouldn't want a C program to be
> able to -- for example -- leak data from kernel structures, or probe the
> memory layout of a running service. 

  But programs *can* do that.  Yes, there are typically restrictions (under
Unix for instance, only programs running under the same UID, or a UID of 0,
can do this).  Otherwise, it would be impossible to use a debugger (a
program who's sole existance is to probe the memory layout of a running
process).

> Before ASLR, it was fairly simple to
> figure out a system's memory map after a clean, deterministic startup. ASLR
> means that even if you were able to get unauthorized privilege escalation
> in C code you still have more work to do before you can go start messing
> around in the kernel.

  I get the feeling that most around here don't understand how this stuff
actually works.  Take the following C function:

	int stupid_auth(char const *name)
	{
	  char buffer[40];

	  puts("Name?");
	  gets(buffer); /* !!!!! NEVER USE THIS FUNCTION!!!! */
	  if (strcmp(name,buffer) == 0)
	    return 1;
	  else
	    return 0;
	}

  It's the call to gets() that is problematic, since there is no way for it
to check the length of the character array it's given and this is a buffer
overwrite waiting to happen.

  In the old, pre-pre-ASLR days, programs were bound to fixed memory
addresses *and the stack was marked as executable* [1]. The stack layout is
known (still is, post-ASLR) and at a known address.  Assuming a 32-bit
machine, the stack would typically look like:

	+-----------------------------------------------+
	| address of "name" parameter (4 bytes)		|
	+-----------------------------------------------+
	| return address (4 bytes)			|
	+-----------------------------------------------+
	| previous stack frame pointer (4 bytes)	| <- stack frame pointer
	+-----------------------------------------------+
	| buffer (40 bytes)				| <- Stack pointer
	+-----------------------------------------------+

  All an attacker had to do was feed 44 garbage bytes to get to the return
address, write a 4-byte address, overwriting the actual return address with
one pointing into the stack, then as many bytes as required that comprise
the code to run.  Classic buffer overrun.

  There's an easy fix---mark the stack as non-executable [2].  Of course
attackers got around *this* by using "return oriented programming" (ROP). 
For that, the attacker identifies the addresses of useful sequences of
already compiled code, and instead of sending code to be executed, sends a
sequence of data and return addresses (basically constructing a custom
sequence of call stack frames) to do the work instead.

  The response to THAT was ASLR---randomizing the addresses used in a
program; not only the stack and heap, but of each function as well.  No
longer are there fixed addressses an attacker can rely upon.  That still
leaves the issue that addresses are fixed *relative* to each other.  You
might now know the address of function foo(), but it is always 1024 bytes in
front of function bar().

  "Aha!  See!  Knowing an address is *still* bad!"

  Well, yes and no.

  If you get the address of a function, like the Lua function file:read(),
that only tells you the locations of other functions *in Lua* and not
necessarily any other function in the program.  In ASLR, not only the main
executable at a random location each time, but each shared library gets a
new address as well, so knowing that file:read() will only get you other Lua
functions that come with Lua.  It won't help you get the address of
(luafilesystem) lfs.rmdir() as its in a different library, loaded at some
random address.  And Lua is not a large library, so the availability of
useful code for ROP is limited (on my system, that's only 124,336 bytes of
code, of which a good portion is NOPs [3] for function alignment to keep the
speed up.  And while that sounds like a lot, it's not (Apache, the web
server I run, is significantly larger), especially when you consider the the
C runtime has over 1,000,000 bytes of code (again, on my system---the size
on your system may vary) to find useful fragments for ROP.

  So, to exploit the fact that you got an address:

	1) if it's an address of data on the heap, it does you no good for
	   an exploit because you can't do ROP
	2) if it's an address on the stack, it does you no good because you
	   still need the address of functions for ROP
	3) if it's an address of a function, that only gives you functions
	   in that library (or main code), which limits how well ROP works,
	   and it *still* needs an address to the stack to even begin
	4) you still need to find a way to overflow the stack

  This is why I consider seeing an address printed is a non-issue.

> As I said, it's about defense in depth. Pointers are not themselves evil.
> Pointers to things that don't belong to you, on the other hand, are a foot
> in the door.

  You can do all this and *still* be exploited:

	http://boston.conman.org/2004/09/19.1

  Okay, that was an inside attack, which is *very* hard to defend against. 
And I'm not saying *don't* bother with security at all.  I'm just tired of
the knee-jerk reaction of "let's do this because security!" leaving people
with the illusion of security when they aren't.  Understand the threat
model.  Just because you have ASLR and don't print addresses doesn't make
you safe from exploits.

  -spc (WON'T YOU THINK OF THE CHILDREN?)

[1]	Why in the holy hell would any operating system do this?  Because
	it's useful, that's why.  Microsoft Windows famiously did this for
	the BlitCopy routine, compiling code on the stack specialized for
	the type of operation required, and yes, even with the overhead of
	"compiling" it was *still* faster than a generalized routine.  It's
	talked about in the book _Beautiful Code_.

[2]	It might make the system slower as techniques in [1] can no longer
	be deployed as easily.

[3]	No OPeration.

Follow-Ups:
- Re: tostring userdata, Roberto Ierusalimschy

References:
- Re: tostring userdata, Roberto Ierusalimschy
- Re: tostring userdata, Patrick Donnelly
- Re: tostring userdata, Sean Conner
- Re: tostring userdata, Coda Highland
- Re: tostring userdata, Sean Conner
- Re: tostring userdata, Coda Highland
- Re: tostring userdata, Sean Conner
- Re: tostring userdata, David Heiko Kolf
- Re: tostring userdata, Roberto Ierusalimschy
- Re: tostring userdata, Coda Highland

Prev by Date: Re: 5.3 manual
Next by Date: Re: Another option for closing variables
Previous by thread: Re: tostring userdata
Next by thread: Re: tostring userdata
Index(es):
- Date
- Thread