[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: How does string.format handle undefined behavior?
- From: Lorenzo Donati <lorenzodonatibz@...>
- Date: Thu, 2 Sep 2021 23:29:41 +0200
On 31/08/2021 19:45, Roberto Ierusalimschy wrote:
To define a behavior as undefined sounds like those "this page
intentionally left blank" pages :-)
C is a real mess! It took me years to really understand what undefined
behavior meant. I never was a C programmer and my knowledge of C has
progressed with jumps, as the need to improve my C skills arose from
time to time.
It *could* be an enjoyable language, although an hard one, if it weren't
for all its warts that the standard committee never (or painfully
I particularly hate the uselessness of bit fields, the absence of a
namespacing facility and the fact that the standard reserves a whole
bunch of unrelated identifiers (mem*, str*, *_t and what not).
And most of the time you end up with an UB in your code.
Anyway, it is as widespread and as standard as a system language can go,
so that's a big deal.
In the particular case of 'printf', the format "%#c" is defined as
undefined, while the format "%+c" is literally undefined. Should we
consider "%+c" as undefined behavior(™)?
I just checked C99 draft standard N1256 (section 188.8.131.52, page 274+).
Effectively paragraph 6 states explicitly that "For other conversions,
the behavior is undefined." wrt to "#" flag and "c" conversion.
OTOH, nothing is said about "+" flag and "c" conversion, as you point out.
Anyway, IMO, Lua should avoid leaking "C undefined behavior" (possibly
bar using the debug library) and it should also avoid other confusing
unspecified behavior derived from C. So anything that doesn't make sense
or is confusing should raise an error.
So "%+c" doesn't really make sense, since "c" means "print the character
whose code is specified", so an error is in order, IMO.
As absurd as it may seem, "%#c" could be given a reasonable meaning in
Lua: since "#" means "use an alternate form", one could "define" what is
"defined as undefined" in C (ugh!). For example, one could force
interpreting the argument as UTF-8 encoded.
Not that I'm actually proposing this. Just pointing out that what makes
sense in a language is not necessarily the same in another. :-)
Anyway, I think also "%#c" should raise an error.
In the long run, IMHO, maybe Lua should really specify what
string.format format string syntax is. I understand that it would make
the manual somewhat bigger (and increase the implementation size), but I
don't think making a reference to the C printf is friendly to pure-Lua
(non-C) programmers. Finding the details of printf mini-language syntax
is not trivial at all for someone not knowing C. Even if they find a
(reliable) reference, they have to parse it ignoring all the C-specific
stuff, which is not easy at all if they don't know jack of C.