[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Ambiguous Syntax (was Re: Free format strings?)
- From: "Peter Hill" <corwin@...>
- Date: Tue, 14 Jan 2003 15:54:43 -0000
Lua5 Manual:
"As an exception to the format-free syntax of Lua, you cannot put a line
break before the ( in a function call. That restriction avoids some
ambiguities in the language. If you write
a = f
(g).x(a)
Lua would read that as a = f(g).x(a). So, if you want two statements,
you must add a semi-colon between them. If you actually want to call f,
you must remove the line break before (g)."
Björn De Meyer:
> Oh, and by the way, these limitations of freeform, and of the return
> statement exist in Lua exist because of difficult problems with the parser
> and the parsing of ambiguous statements. If you or someone else could
> solve these problems without altering the language, then that would be
> nice...
Peter Hill:
> Unfortunately, if a syntax is ambiguous then (by definition) it can't be
> fixed without changing the language, at least in some manner. I guess the
> aim is to be the least distruptive, without being too ugly.
Hmm... I've had a better look at the syntax now and I think the chosen fix
(requiring a ";") is actually rather un-Lua in nature. The "correct" _Lua_
approach would be to prohibit 'functioncalls' from starting with a "(".
Why do I say this? From observation (...some Design Notes in the manual
would be nice...) the basic desire of Lua syntax seems to include the
following three edicts:
(a) Allow free format.
Ie, newlines have no syntactic meaning. For example, in Pascal we have:
begin a end
as the same as
begin
a
end
(b) To do so without requiring statement separators like the ";" in Pascal.
C manages to do this. Unlike the Pascal example:
begin a ; b ; c end
which requires statement separators, C has:
{ {} {} {} }
requiring none.
There is, however, a potential problem. Statements are now separated by the
"implicit infix operator" (ie, two adjacent values with no actual infix
symbol between them) but this is also classically used for function
application (eg, "f (x)"). Overloading this operator will produce ambiguous
code unless the contexts are distinguishable. In C this is done by
converting expressions (a context which can contain function applications)
into statements (a context where statement lists can exist) by using the ";"
postfix operator. Note that this is a *totally* different use of ";" than
Pascal makes; something which can produce occassional confusion. 8-X
Which brings us to the third Lua desire.
(c) No explicit expression -> statement operator.
Now we have trouble. We've asked for too much and, for a general homogenous
grammar, "X Y" is now _totally_ ambiguous as to whether it is two statements
X & Y, or the function X applied to Y.
The only way out (given that we refuse to let go of conditions a, b and c)
is to de-homogenise the grammar... allowing one meaning to apply to some
situations and the second meaning to others. This decision was taken in Lua
a 'long time ago'. For example, it was decided that:
X {Z}
would be interpreted as _function application_, since this construct is used
quite often while using:
{Z}
to start a statement is rarely (if ever) used.
And so on... partitioning each "X Y" case into <function application> or
<statement list>... until it was (suddenly?) realised that the "X (Y)" case
had not been partitioned. What to do? The natural Lua response would be to
simply partition it to the most common case (in this case, function
application) however, for some reason, it was decided instead to break rule
(b).
So why break from the general Lua approach? Is this case so hard to
partition? I don't think so.
Consider:
X (Y)
as function application. It happens all the time... Lua code is full of
examples. So what about:
X
(Y)
which starts a statement with a "(". How needed is that? Since "X[Y]" and
"X.Y" are both postfix operators, grouping with a leading parenthesis is
unnecessary. The only cases that require grouping are:
(i) (function (a) ... end)(123)
(ii) (<unop> x)(123)
(iii) (x <binop> y)(123)
Case (i) is important but, as I mentioned on the list a few days ago, the
code can be converted to allow:
function (a) ... end (123)
by changing only a few lines.
Cases (ii) and (iii) are generally an error (unless there has been some
metamethods at work, and even then things like (1+2)(Y) are pretty odd) the
exception being the boolean operators which can easily return functions. For
example "(fred or print)(123)". In this rare case one can either assign the
chosen function to a variable, eg:
local f = fred or print
f(123)
or "bracket" the code with an anonymous function, rather than with "( )",
eg:
function() return (fred or print) end (123)
Not neat, it's true... but a reasonable price to pay for a fairly rare
occurance, and well within the Lua edicts.
*cheers*
Peter Hill.
PS:
With regard to the odd limitation on Break and Return, I think this is
because Return can (but need not) take an argument, making determination of
parsing termination difficult. Why it applies to Break I don't know :-(.
This limit ("must be at end of block") doesn't actually produce a problem in
practice, though, since it is "incorrect" to have statements following a
Return/Break anway, as they can _never_ be executed.
As far as the wrapping Return in "do return end" that's mentioned in the
manual, the only conceivable use is to "comment out" code by making the
following code unexecutable. In Lua5, however, it would be much better just
to use a long comment "--[[".