lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


The parser can still initially parse the "-" token separately from the parsed value "9223372036854775808" which cannot be an integer and is then given a double value (hoping that the double will keep its precision, but even if a double has less bits, the truncated bits in this case are all zeroes.

So the double value 9223372036854775808 which is still a constant, can easily be tested when it appears next to the unary minur operator in the syntax tree which now as tow tokens: suring the "reduce" step of the parser: the unary minus and the double value which is still exactly exactly equal to 9223372036854775808.
This just requires an additional reduce rule in the syntaxic parser (not in the lexer) to resolve it as if it was a single precomputed negative integer constant for this specific case.

However this would mean that -922337203685477580.8e1 would turn to be parsed as an integer even if the source specified it explicitly as a double.

There are two ways to handle this:
- either the lexer does not parse numeric strings itself, and leaves the token as a string. The actual conversion to a datatype will be done in the parser.
- or the lexer parse the numeric strings and does not only return the token with attributes set to the numeric value, but also containing an "hint" indicator of which parser (integer or floating point) it used to reduce the numeric string to a resolved floating point constant. This approaches complicates only one parser rule:

unaryexpression ::= '-' INTCONSTANT
unaryexpression ::= '-' DOUBLECONSTANT
  { if (tokens[1].doublevalue == 9223372036854775808.0) {
       tokens[1].type = INTCONSTANT;
       tokens[1].intvalue = -9223372036854775808;
  }
unaryexpression ::= ....
unaryexpression ::= '-' (_expression_)

(here the "tokens[]" is some (C-like) array that access to properties of tokens returned by the lexer, and being reduced in the parsiing rules (whose number of tokens in the array is determined by the parsing rule, here there are 2 tokens) and stored and modified by the parser in its abstract syntax tree, assuming that tokens[i].type is one of the defined token type constants returned by the lexer which can also set tokens[i].intvalue or token[i].doublevalue, or token[i].stringvalue for NAME token types or for LITTERALSTRING token types).


Le mar. 27 nov. 2018 à 12:51, Muh Muhten <muh.muhten@gmail.com> a écrit :
On 11/27/18, pocomane <pocomane_7a@pocomane.com> wrote:
> -- Min integer literal, BUG ???
> assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
> assert(math.type(-9223372036854775808) == 'float')

The issue appears to be that the lexer sees '-' separately from the
numeral itself. As such, when reading the number, it must fit in a
non-negative integer, and is then constant-folded to the actual
negative. Incidentally, it appears that this corner case has already
been noticed and is dealt with by the %q format:

lstrlib.c
961:        const char *format = (n == LUA_MININTEGER)  /* corner case? */
962-                           ? "0x%" LUA_INTEGER_FRMLEN "x"  /* use hexa */
963-                           : LUA_INTEGER_FMT;  /* else use default format */

It seems to me that making this particular case work would be rather
involved, and essentially require negative numbers to be first-class
citizens in the grammar, rather than cobbled together through unary
minus and constant folding. I also don't see a satisfying solution
for, e.g. "- 9223372036854775808", "- --[[]]9223372036854775808",
"-(9223372036854775808)", though arguably those are *explicitly* the
negation of positive 9223372036854775808, which doesn't fit, and
really should be an integer.

In any case, the main uses I can see for having that number as a
constant and an integer are qua -2^63 and qua minimum integer. Perhaps
some variation on "-2^63|0", "-1<<63", "~(~0>>1)", or
"-0x8000000000000000" might be suitable? (Though that last one only
happens to work because 0x8000000000000000 == -0x8000000000000000 (64
bits) in 2's complement, and the unchecked overflow from casting to
signed after reading a hex literal may be undefined behavior.)