[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Inconsistent hex literal parsing (language, Lua, LuaJIT)
- From: Philipp Kutin <philipp.kutin@...>
- Date: Fri, 27 Jul 2012 12:58:51 +0200
Hi,
first off, the issue I will describe is somewhat similar to some postings found in the archive (mostly "luaO_str2d, strtod and hexadecimal numbers under IRIX"), but I decided to create a new thread instead of replying to a 2-year old one because I feel that it warrants new discussion.
In the following code snippet,
print( 0xffffffff)
print(-0xffffffff)
print( 0x100000000) -- 8 zeros
the first two lines always print the expected decimal representations 4294967295 and -4294967295, but the last one gives different results between various platforms and Lua implementations:
-- Ubuntu Linux (32-bit or 64-bit) Lua 5.1.5 or LuaJIT git HEAD:
4294967296
-- Windows XP (32-bit), Lua:
4294967295 -- !!!
-- Windows XP, LuaJIT:
luajit.exe: ./hexlitest.lua:3: malformed number near '0x100000000'
In all cases, the source was compiled with gcc (or MinGW's gcc 4.6.2) in the default configuration.
The root cause is the use of strtod() in both implementations' parsing/lexing routines: while on Linux, hex literals denoting a value greater than UINT32_MAX are parsed as floating hex literals by that function, MinGW (to my understanding) hands off strtod to the Windows CRT, which is known for its fragmentary C99 support; C89 has no concept of hex floating literals, so strtod fails (even if LuaJIT was compiled with CFLAGS=-std=c99, by the way). With Lua, the string is then passed to strtoul, where it of course saturates, and LuaJIT just flat out refuses it.
But the deeper problem behind those inconsistencies in my opinion is that the Lua 5.1 Reference (which does claim to be "the official _definition_ of the Lua language" after all) is *very* vague in respect to what constitutes a valid hexadecimal literal (compared to the wording in say, the C99 Standard):
>> Lua also accepts integer hexadecimal constants, by prefixing them with 0x.
This raises many questions at once: what about literals greater than 0xffffffff? What about a "0x1" followed by 64/4 zeros (2**64, which is representable exactly, but is larger than the double N for which all integers 0 <= i < N are representable exactly)? What about literals that denote values that are not representable exactly? What about the "0X" prefix? And so on...
Moreover, using strtod for parsing produces different results for some string-to-number conversions:
print(tonumber("infinity")) -- see C99 7.20.1.3
print(tonumber("0x1p2")) -- binary exponent
print(tonumber("0x1.f")) -- binary fraction
These are different across C89/C99 implementations of strtod and may be inconsistent between Lua/LuaJIT because of the slightly different order of trying to parse a particular string.
As far as "fixing" the issue goes, various alternatives pop to mind:
- on MinGW and other platforms stuck with C89 strtod, use a different one, for example from uClibc. But I guess it would be more logical to complain to the MinGW people than to include it in the Lua* sources? (If it's at all possible; dunno how LGPL and Lua license go together.)
- use strtoull. However, this is also C99-only (available with _strtoui64 on MSVC, though) and would not treat values beyond UINT64_MAX properly
- Write the hex literal parsing as plain C using doubles (like "val = 16*hexdigit(*c)+val" ...). This is what I personally prefer, since it would eliminate the quirks that a particular libc implementation exposes, and also wouldn't accept a superset of the values that the Lua Reference suggests.
Finally, to demonstrate that large hex literals are in no way only of theoretical interest, the code I stumbled upon this was (unsurprisingly) in a parser for a language X (which has only 32-bit ints an numerical type) that translated the code for X into Lua:
-- numstr is guaranteed to be in the form "0[Xx][0-9a-f]+" here
local function parse_number(pos, numstr)
local num = tonumber(numstr) -- would return nil...
if (num < -0x80000000 or num > 0xffffffff) then -- ...and fail here
perrprintf(pos, "number %s out of the range of a 32-bit integer", numstr)
num = 0/0
elseif (num >= 0x80000000 and numstr:sub(1,2):lower()~="0x") then
pwarnprintf(pos, "number %s converted to a negative one", numstr)
num = num-0x100000000 -- wouldn't compile
end
return num
end
I ended up writing (0xffffffff+1) instead, but IMO a consistent behavior mandated by Lua-Ref would be far more desirable. What do you think?
Cheers,
Philipp