lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Peter,

I'd have to agree with you: this is strange behavior (even a bug). The
lexer takes a shortcut by scanning past the first '.'. It should stop
there.

There's more: the lexer will read all of this as a single number, and of
course fail to convert it:

1.2.3.4.5e+say_cheese

The lexer's routine 'read_numeral' matches the regular expression
[0-9.]+(e[+-]?)[a-zA-Z0-9]*

The routine is called when the input either begins with a digit or '.'
followed by a digit.

The relevant code: (Lua 5.1.4)

> static void read_numeral (LexState *ls, SemInfo *seminfo) {
>   lua_assert(isdigit(ls->current));
>   do {
>     save_and_next(ls);
>   } while (isdigit(ls->current) || ls->current == '.');
>   if (check_next(ls, "Ee"))  /* `E'? */
>     check_next(ls, "+-");  /* optional exponent sign */
>   while (isalnum(ls->current) || ls->current == '_')
>     save_and_next(ls);
>   save(ls, '\0');
>   buffreplace(ls, '.', ls->decpoint);  /* follow locale for decimal point */
>   if (!luaO_str2d(luaZ_buffer(ls->buff), &seminfo->r))  /* format error? */
>     trydecpoint(ls, seminfo); /* try to update decimal point separator */
> }


On Thu, 2010-02-04 at 23:53 +0000, Peter Cawley wrote:
> Hello all,
> 
> I was recently writing some syntax highlighting code for Lua, and
> while trying to duplicate the behaviour of the standard Lua parser, I
> noticed a curious behaviour. The first two of the following examples
> are completely normal. The third is unexpected to me - according the
> reference manual on numbers, "A numerical constant can be written with
> an optional decimal part and an optional decimal exponent", so I would
> expect it to be parsed like the first example, as the parser should
> stop trying to match a number at the second decimal point. The fourth
> example is included for completeness; it could in theory be parsed
> like the second example, or as 1. followed by . and ""
> 
> > =1. ..""
> 1
> > =1 ..""
> 1
> > =1...""
> stdin:1: malformed number near '1...'
> > =1..""
> stdin:1: malformed number near '1..'
> 
> Are there are reasons why the third example is parsed like it currently is?