The parser can still initially parse the "-" token separately from the parsed value "9223372036854775808" which cannot be an integer and is then given a double value (hoping that the double will keep its precision, but even if a double has less bits, the truncated bits in this case are all zeroes.
So the double value 9223372036854775808 which is still a constant, can easily be tested when it appears next to the unary minur operator in the syntax tree which now as tow tokens: suring the "reduce" step of the parser: the unary minus and the double value which is still exactly exactly equal to 9223372036854775808.
This just requires an additional reduce rule in the syntaxic parser (not in the lexer) to resolve it as if it was a single precomputed negative integer constant for this specific case.
However this would mean that -922337203685477580.8e1 would turn to be parsed as an integer even if the source specified it explicitly as a double.
There are two ways to handle this:
- either the lexer does not parse numeric strings itself, and leaves the token as a string. The actual conversion to a datatype will be done in the parser.
- or the lexer parse the numeric strings and does not only return the token with attributes set to the numeric value, but also containing an "hint" indicator of which parser (integer or floating point) it used to reduce the numeric string to a resolved floating point constant. This approaches complicates only one parser rule:
unaryexpression ::= '-' INTCONSTANT
unaryexpression ::= '-' DOUBLECONSTANT
{ if (tokens[1].doublevalue == 9223372036854775808.0) {
tokens[1].type = INTCONSTANT;
tokens[1].intvalue = -9223372036854775808;
}
unaryexpression ::= ....
unaryexpression ::= '-' (_expression_)
(here the "tokens[]" is some (C-like) array that access to properties of tokens returned by the lexer, and being reduced in the parsiing rules (whose number of tokens in the array is determined by the parsing rule, here there are 2 tokens) and stored and modified by the parser in its abstract syntax tree, assuming that tokens[i].type is one of the defined token type constants returned by the lexer which can also set tokens[i].intvalue or token[i].doublevalue, or token[i].stringvalue for NAME token types or for LITTERALSTRING token types).