Re: future of annotations in Lua?

Le sam. 8 juin 2019 à 09:59, Philippe Verdy <verdy_p@wanadoo.fr> a écrit :

Another useful case for annotations is to allow compilers to perform some optimizations that are otherwise unsafe.

Just look at this _expression_:
f()+f(),
it contains two syntaxically equivalent subexpressions f(). But the compiler cannot cache the result to call the function only once and keep the result in an internally generated caching variable, so it cannot act as if we had written:
local x=f()
x+x
As well we cannot declare variables (like x here) in the middle of expressions. But we can annotate the subexpression (f()) to be "constant" and having no side effect:
*const (f()) + *const (f())
Here the annotation is just written "*const", it does not require surrounding parentheses because "*" is not a valid unary operator in expressions. But the surrounding parentheses around f() are needed, because otherwise the annotation would apply only to "f" and not f(), but we could avoid these extraparentheses if annotations in expressions behave like unary operators and are right-associative in which case it becomes:
*const f() + *const f()

Now the compiler can generate the internal temporary variable (let's name it "x" even if the name is not visible in the lexical scope) itself for calling f() only once and store its result which is then used in the _expression_
x + x

Now that same _expression_ contains two fetches of the same variable, but the compiler still does not know the type of that variable, so it cannot optimize it further using a multiplication by a constant as if it was
x * 2
and where it would not even need the temporary variable, so that it would be equivalent to
f() * 2

To do that, we need a second annotation declaring that the value of the function is a number:
*const *number f() + *const *number f()
The compiler now detects two occurences of the same subexpression "*number f()" both of them cachable because of "*const". So it first infers as if we had declared:
local *const x = *number f()
which also translates to
local *const *number x = f()
and then "sees" the _expression_
x + x
which it can now safely optimize (because it knows that "x" is a number in both operands of the "+" operation to
x * 2
and then, because the temporary variable "x" is now used only once, it can eliminate it by substitution and generate the same thing as if we had written
f() * 2

Annotations can then be very useful for provide hints to the compiler, notably everywhere it cannot safely perform type inference. If needed, the compiler will insert type-checking assertion code at run time (throwing errors if the assertion failed, for example, here, if f() did not return a number).

If these hints are not recognized by the engine, then the evaluation of f()+f() will be unchanged, the function will be called twice, possibly returning two different values and possibly not numbers, and a runtime check will see how to compute the addition (within the intrinsic _add(x,y) function call).

Le sam. 8 juin 2019 à 08:44, Philippe Verdy <verdy_p@wanadoo.fr> a écrit :
As well, given the large possible choice for the first significant token of the annotation, we can say that:

- if the first token is "*" or "@", it is reserved for the standard specification of Lua including the future ones (so what follows that "*" must obey to these specifications)

- all other usable tokens are for extensions that can be entirely and easily ignored by a conforming parser that don't recognize it, it should not invalidate the interpretation of the rest of the syntax. But Lua may add a constraint for them, requiring these annotations to use the "surrounding rule" (with parentheses).

Le sam. 8 juin 2019 à 08:32, Philippe Verdy <verdy_p@wanadoo.fr> a écrit :

Le sam. 8 juin 2019 à 07:56, Egor Skriptunoff <egor.skriptunoff@gmail.com> a écrit :
On Fri, Jun 7, 2019 at 12:35 AM Lorenzo Donati wrote:

The more I think about it, the more I find the syntax with "@" more
readable and more easily "expandable": parametrized annotations anyone?
Like for example:

local @const myTable = @table(64) {} -- preallocates 64 elements in the
array part

Please note that Lua lexer allows inserting a space between any two lexems.
For example, the following is allowed in Lua syntax:
::
labelname
::
x = math
.
pi
goto labelname

My question is:
if an attribute name and an opening parenthesis are distinct lexems
(and there could be a space or a newline in between),
then what is the pure syntactical way to determine where an attribute is terminated?

Your example:
local myTable = @table(64) {}

Variant1:
"@table(64)" is the attribute
"{}" is the _expression_

Variant2:
"@table" is the attribute
"(64){}" is the _expression_: you are invoking number 64 and pass empty table as argument :-

You are repeating my identical remark when I replied to Lorenzo Donati, about why it is ambiguous (and then developed later).

And I also summarized it, but I can repeat my former analysis:

Any annotation in Lua can ONLY FOLLOW another token that:
- marks the start of a simple statement (like "local", "return", or even ";" for the empty statement), or
- marks the start of a composite syntaxic unit (like "(", "[", "{", or "begin"), or
- marks the end of a composite syntaxic unit (like ")", "]", "}", or "end").

So it cannot occur in the middle of an _expression_ (except possibly immediately after "(", "[", "{" or ")", "]", "}", etc. but this depends on the permitted choice for the first token of annotations).

The first token used by the annotation
- MUST NOT be a valid unary operator (like "+", "-" or "not"),
- MUST NOT be a number constant or string constant.
- but it MAY be ANY other existing token

That first token MAY then ALSO unambiguouly be:
- binary operators like ("*", "..", "div", "or", "and", "<", "=", etc.) or ","
- or other reserved keywords used in compound statements ("begin", "end", "if", "then", "else", "do", "while", "repeat", "until", "for", "return", "break", "local", "function", etc.)
- or currently unused token like "@"
- or even possibly ";" (but I would not allow it as it would be errorprone with source code that is possibly partially commented out)

provided that:
- the annotation is ENTIRELY surrounded by "(...)" or "[...]" or "{...}" which would be required (except after some tokens like "local" which mark an explicit start of a new statement that can be annotated)
- or the first token is a currently undefined one like "@" (where the previous surrounding is not always needed)

So we have a large choice for defining them unambiguously and generalizing them!

I still think that choosing "@" for the first token is the best choice. But this does not invalidate the choice of "*" or "<", provided the surrounding rule is used.
and for the proposed syntax "<annotation>" is the worst choice, if we need to surround it by additional parentheses to avoid ambiguous shift-reduce conflicts in some places, resulting in the horrible "(<annotation>)" !