[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: How to extract a floating point number locale-independantly
- From: Daurnimator <quae@...>
- Date: Tue, 26 Apr 2016 13:30:46 +1000
On 26 April 2016 at 11:55, Daurnimator <firstname.lastname@example.org> wrote:
> I had a report that some of my code was failing in the nb_NO-utf8 locale:
> Indeed, `tonumber` behaves differently depending on locale:
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(tonumber("1.0"))'
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(tonumber("1,0"))'
> However it *doesn't* affect lua parsing; which seems to be against the manual.
> From http://www.lua.org/manual/5.3/manual.html#pdf-tonumber
>> The conversion of strings can result in integers or floats, according to the lexical conventions
>> of Lua (see §3.1). (The string may have leading and trailing spaces and a sign.)
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(load([[return 1.0]])())'
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(load([[return 1,0]])())'
> 1 0
> This leaves me with the question of how to deserialise a floating
> point number within a lua library (where I can't control the locale of
> the application).
> Normally I'd write something like:
> local mystring = "some protocol 1.0"
> local version = mystring:match("(%d+%.%d*)") --> version == "1.0"
> version = tonumber(version) --> version == 1.0
> Or using lpeg:
> local digit = lpeg.R "09"
> local version_patt = digit^1 * lpeg.P "." * digit^0 / tonumber
> local parser = something * version_patt
> local version = parser:match "some protocol 1.0" --> version == 1.0
> But this doesn't work in some locales :(
> What do other people do? and is there a good solution?
I had a look into how lua does this locale independently and found
==> if parsing in the current locale fails, it tries replacing "."
with whatever the current locale's decimal separator is.
This means was suprising to me and has a number of consequences:
- Using a seperator such as "," is impossible in the first place as:
- A number has 'ended' if it doesn't match: `else if
(ls->current == '.')` in
- Numbers without a leading zero using a non-"." separator (e.g.
,2) would never work due to the `case '.'` in the lexer.
- If running in a locale where '.' is not the decimal separator,
parsing lua could result in *many* calls to `localeconv`.
- localeconv is not threadsafe, which means that if another thread
changes the locale while lua is parsing, "interesting" results could
I'd suggest that future versions of lua use 'strtod_l' and friends
(the locale independent variants) where available, and only fall back
to the `localeconv` hack if compiling in C89 mode.
- strtod_l is available on linux (at least both glibc and musl) if
_GNU_SOURCE is defined
- _strtod_l is available in MSVC since VS2005
- strtod_l is available on OSX since Darwin 8.
- strtod_l is available on FreeBSD since 9.1
Fixing lua_str2number to be locale independent will:
- allow the `trydecpoint` hack to be removed from llex.c
- fix tonumber() to *not* be locale dependant.