[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: How to extract a floating point number locale-independantly
- From: Daurnimator <quae@...>
- Date: Tue, 26 Apr 2016 13:30:46 +1000
On 26 April 2016 at 11:55, Daurnimator <quae@daurnimator.com> wrote:
> I had a report that some of my code was failing in the nb_NO-utf8 locale:
> Indeed, `tonumber` behaves differently depending on locale:
>
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(tonumber("1.0"))'
> nil
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(tonumber("1,0"))'
> 1,0
>
> However it *doesn't* affect lua parsing; which seems to be against the manual.
>
> From http://www.lua.org/manual/5.3/manual.html#pdf-tonumber
>> The conversion of strings can result in integers or floats, according to the lexical conventions
>> of Lua (see §3.1). (The string may have leading and trailing spaces and a sign.)
>
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(load([[return 1.0]])())'
> 1,0
> $ lua -e 'os.setlocale("nb_NO.utf-8") print(load([[return 1,0]])())'
> 1 0
>
> This leaves me with the question of how to deserialise a floating
> point number within a lua library (where I can't control the locale of
> the application).
> Normally I'd write something like:
>
> local mystring = "some protocol 1.0"
> local version = mystring:match("(%d+%.%d*)") --> version == "1.0"
> version = tonumber(version) --> version == 1.0
>
> Or using lpeg:
>
> local digit = lpeg.R "09"
> local version_patt = digit^1 * lpeg.P "." * digit^0 / tonumber
> local parser = something * version_patt
> local version = parser:match "some protocol 1.0" --> version == 1.0
>
> But this doesn't work in some locales :(
> What do other people do? and is there a good solution?
I had a look into how lua does this locale independently and found
this function:
http://www.lua.org/source/5.3/llex.c.html#trydecpoint
==> if parsing in the current locale fails, it tries replacing "."
with whatever the current locale's decimal separator is.
This means was suprising to me and has a number of consequences:
- Using a seperator such as "," is impossible in the first place as:
- A number has 'ended' if it doesn't match: `else if
(ls->current == '.')` in
http://www.lua.org/source/5.3/llex.c.html#read_numeral
- Numbers without a leading zero using a non-"." separator (e.g.
,2) would never work due to the `case '.'` in the lexer.
- If running in a locale where '.' is not the decimal separator,
parsing lua could result in *many* calls to `localeconv`.
- localeconv is not threadsafe, which means that if another thread
changes the locale while lua is parsing, "interesting" results could
occur.
I'd suggest that future versions of lua use 'strtod_l' and friends
(the locale independent variants) where available, and only fall back
to the `localeconv` hack if compiling in C89 mode.
- strtod_l is available on linux (at least both glibc and musl) if
_GNU_SOURCE is defined
- _strtod_l is available in MSVC since VS2005
- strtod_l is available on OSX since Darwin 8.
- strtod_l is available on FreeBSD since 9.1
Fixing lua_str2number to be locale independent will:
- allow the `trydecpoint` hack to be removed from llex.c
- fix tonumber() to *not* be locale dependant.