lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


What whould you expect from  string.format("%02c",string.byte"-") or  string.format("%02c",string.byte"+")?
zero-padding on the left of a "%c" is undefined in C because zero padding is only defined when the field contains digits, and the placement of a sign (at start or end of the field before the character itself will affect the count of zeroes to fill the field; note that a sign may not always be present); the "%c" placeholder does not define the behavior when there is an optional sign or when the character itself is a sign.
But most implementation assume that the value given to %c behaves like a digit.
Note that "%2c" uses *right* padding with spaces, while "%2d", %2o", "%2x", "%2f"  uses *left* padding with spaces for formatting numeric values (decimal, octal or hexadecimal integers or floats).

If %c behaves the same when you use the "0" flag, the zeroes would be padded to the right with "%0*c", while "%0*d" is padding to the left.
"%c" does not have any semantic related to numeric values, but acts like "%s" with right padding only (and only with spaces)

So with "%02c", the zero flag should be simply ignored and the character should be displayed with a trailing padding space (which should be the standard). Padding zeroes (on the left or the right) is clearly non-standard, giving unpredictable results, as long as the "0" flag is not formally standardized for "%c" and "%s" placeholders (which just define a minimum and maximum width with right padding with spaces; the maximum significant width being specified after the dot as in "%5.8c" or "%5.8c"; note that "%5.8c" would mean a maximum of 8 significant characters and a maximum of 5 characters, the value 8 is ignored because %c generates only one character which is always satisfied; now "%5.0c" would take 0 characters from input, so the character would not be displayed at all, and only the first width specified before the dot will have effect and will be the number of spaces generated: the result will always be 5 spaces independantly of the value "%5.0c" or "%0.5s" would generate the same thing, the first one taking an integer (interpreted as a char) from the varargs (usually on the stack but not necessarily, it could be in an buffer allocated elsewhere, e.g. with "v[s][n]printf(char *format, void *varargs)"), the second taking a (char*) pointer from varargs which is never accessed, but that may have a different size in the parameter stack so that advancing the pointer of varargs could be different).






Le dim. 29 août 2021 à 15:24, Lorenzo Donati <lorenzodonatibz@tiscali.it> a écrit :
On 29/08/2021 07:19, Flyer31 Test wrote:
> But what is your problem with this result "1" or better "01" for
>
> str= "1"
> string.format("%02c", str)
>
> ?

I wasn't criticizing your statements, but I was replying to @nobody, who
seemed to imply that UB was fine if you could test that the executable
consistently provides a sane result on a given platform.

That is a fallacious approach, since a compiler can produce whatever
code it sees fit when the source code triggers an UB. So observing the
actual apparent behavior of the executable is not enough to state
something like "yes, the code has UB, but in this case the executable is
ok and doesn't do anything nasty".

The *executable* code could hide a segfault (or worse) waiting to happen
if its environment changed, for example.

The only guaranteed safety when you write code with any UB described in
the standard is when an implementation chooses to *define* and
*document* that case (the standard allows that as an extension). However
the source code becomes non-portable.

Relying on observable behavior from an executable generated by an
implementation that doesn't provide that guarantee is wrong.

UB doesn't mean "the executable generates an error" or "the executable
(always) behaves in obvious erratic ways". It means "you can't say
ABSOLUTELY NOTHING about the behavior of the executable". It could
behave nicely for 1 millions executions on the same machine with the
same configuration and then format your hard disk when executed on the
1st of July of a leap year. Yes, an implementation could also produce
sane (and safe) machine code, but the point is: you cannot tell for sure
by simply observing some output.

The only way to be absolutely sure that the compiler generated safe
executable code when compiling a source with UB would be to actually
analyze the machine code produced, which is ridiculous in practice (you
could do that for research or curiosity, but not in a production
environment).


>
> Isn't this exactly what would be expected, or am I standing on the line somehow?
>

>
> On Fri, Aug 27, 2021 at 9:31 PM Lorenzo Donati
> <lorenzodonatibz@tiscali.it> wrote:
>>
>> On 27/08/2021 20:00, nobody wrote:
>>> On 27/08/2021 16.28, Roberto Ierusalimschy wrote:
>>>> Thanks for the report. Do you have any real case where this is causing
>>>> problems? (e.g., a platform with a weird behavior for these uses, a tool
>>>> that complains about Lua source code.)
>>>
>>> For reference, the standard *explicitly* says "behavior is undefined"
>>> and not just unspecified, but I tried both gcc and clang in gnu99 and
>>> C11 modes with -fsanitize=undefined and neither of them produced any
>>> warnings when formatting "%02c" or "%#02.4c" and other nonsense.  (That
>>> said, as far as I understand UBSAN isn't supposed to catch _everything_,
>>> just a subset of all undefined stuff.)
>>>
>>>>> And from testing them on my machine,
>>>>> string.format("%02c",string.byte"1") results in "01", so it isn't
>>>>> ignored.
>>>
>>> On my machine, "%02c" produces " 1" so while it seems to be not truly
>>> undefined "in practice", at least the behavior can't be relied on.
>>>
>> Just for the record, as always in C, undefined behavior is to be avoided
>> at all costs, since the *actual behavior* of the specific platform can
>> change unexpectedly even with the same executable on the same machine.
>>
>> What appears to be consistent and sane behavior (e.g. no segfault) after
>> compilation could even change if the OS runs low on memory (for
>> example). The only way to tell if the behavior is sane is to check the
>> disassembled compiler output and see if the machine code behaves sanely
>> *in every possible case*, which is of course ridiculous.
>>
>> Unless an implementation chooses to define an otherwise UB (and
>> specifies that in the docs), UB is just dragons waiting to wreak havoc
>> on your machine.
>>
>> Such bugs can lay dormant for years, until someone changes a compiler
>> setting or switches to a new compiler or, even worse, changes the
>> runtime environment of the same old executable, then Murphy will grant
>> you hours and hours of fun! <grin>
>>
>>
>>> Cheers!
>>> -- nobody
>>>
>>
>>
>> Cheers!
>>
>> -- Lorenzo
>