[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: How does string.format handle undefined behavior?
- From: Lorenzo Donati <lorenzodonatibz@...>
- Date: Sat, 4 Sep 2021 22:41:32 +0200
On 04/09/2021 15:53, Egor Skriptunoff wrote:
On Fri, Aug 27, 2021 at 10:31 PM Lorenzo Donati wrote:
UB is just dragons waiting to wreak havoc
on your machine.
How dangerous are string.format() dragons?
Should string.format() be inaccessible to untrusted Lua code?
Or, maybe Lua protects safely from the main threat (buffer overflow),
and all other dragons are small and acceptable?
It all depends on the implementation of the runtime of the C library Lua
is linked to. UB means what it means. There is no "small" dragon. The
term "dragon" doesn't indicate a bug, but the event that a bug could be
catastrophic. It's the whole point of the standard washing their hands
And that's why there are so many tools to analyze code to avoid UB in C
Who knows what a printf implementation does when passed something the
standard states it is UB? You cannot rule out that a buffer overflow
won't happen when passing "%#c" to printf. The standard says it's UB and
*anything can happen*. You have NO guarantee at all, even if Lua checked
every other thing. Once it crosses the Lua/C barrier, Lua has no control
Remember that an implementation is allowed to give it for granted that a
C program doesn't contain code that generates UB. This means that an
implementation can legitimately lack any code to even check that "#"
flag is present for a "c" conversion. So, for example, imagine a big
switch statement where every branch handles one possible flag that's
allowed: when "#" appears it may force execution the default branch, if
present, or execution may fall-thru to the switch end without handling
the format code properly because that "#" should not be there. You could
end-up in a segment of code never meant to be executed in those
conditions. You cannot assume the state of the program is necessarily
consistent at that point, and resources may be leaked, such as buffers.
An implementation is not "wrong" or "buggy" if it doesn't handle that
case, it simply takes advantage of the leeway afforded by the standard.
If an implementation gave you some *guarantees* in the handling of
printf format string beyond what the standard states (e.g. no buffer
overflows, regardless of what the format string is, even in UB cases),
that would be an extension. No implementation is required to do so.
And if your program relied on those guarantees, it would be non-portable
*by definition*. That may be fine, if you don't mean to port your code
to other platforms, but that's all.
The whole mess of UB is just that: people thinks "most implementation
won't do something silly in this case", then you find the "right"
compiler switch, the "right" compiler version, the "right" DLL
linked-in, the "right" C-lib version and some years down the road
something goes horribly wrong.
Of course there are applications where a catastrophic crash can be
tolerated (by some definition of "tolerance"), especially when you run
your application under an OS (although losing days of work because, for
example, an engineering simulation program crashed unexpectedly because
someone produced a badly formatted log message is not a "small" dragon,
But at system level, perhaps on a MCU running Lua embedded in a C
application, maybe with no OS, where "stdio.h" is not even a required
library (and if provided is often dumbed down) a "dragon" may mean a
physical system goes down hard. Even if we don't consider safety
critical applications (Lua in automotive? Probably not, since I don't
think it's code was written with MISRA in mind), a printer suddenly
resetting to factory defaults or losing control of a bunch of sensors,
leading to massive paper jams, is not something I would dismiss.
FWIW, IMO the absence of "C-type UB" in a programming language is a very
big plus. If something is wrong a program should fail early and loudly,
possibly with a clear error message. Lua is advertised as a "safe"
language as long as one doesn't use the C-API or the debug library. I
consider it a bug having corner cases where this is not true, especially
because a Lua programmer doesn't expect a "C-type UB" dragon lurking in