lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Note that the situation of C++ is even worse not just because "<<" and ">>" tokens but also for "<" and ">" as well as ">=" which are also ambiguous as binary operators. A C++ parser has to discriminate the meaning between the binary comparison operator and template parameter delimiters using contextual analysis of the semantics of what has been parsed and semantically attributed before (this is a major difficulty, because the semantics cannot be fully computed before the whole syntaxic analysis is terminated. So the semantics is partially built during syntaxic analysis, and in some cases this is not resolved immediately, and backtracking is unavoidable (this makes parsing C++ really slow and very memory intensive.

C++ in contrast allows any type to be promoted, so booleans, numbers or any type can be promoted and become comparable with "<" and ">": for this reason, the C++ requires to resolve shift-reduce conflicts after "<", "<<" ">>", or ">" in favor of reduce each time the previous context is a template name (including a template function name, not just a template type name) for "<", or each time there's an unclosed template parameter list for ">", ">>" or ">=". If programmers still want the interperation of "<", ">", "<<", ">>", or ">=" as binary operators in expressions usable inside the parameter list of a template, these expressions have to be surrounded by additional parentheses... This is tricky, and in some cases can generate unexpected behavior not intended by the programmer (that need to be aware of this ambiguity: in doubt with their expressions, notably in program generators or when using macros, these expressions need to be parenthesed)

This also occurs in Java for templates but in a bit less critical way because Java has stricter type requirements and much less permissive type promotion rules (so boolean values returned by binary comparators cannot be implicitly promoted and are not comparable with "<" and ">" (as well template names are not comparable, unlike what C++ allows for function names which are just comparable pointers!), but there are also cases where a template parameter could be a comparable type (e.g. an integer constant), so there's also an shift-reduce ambiguity for ">" and ">>", which is solved like in C++ as a reduce (close the list of  template parameters) and forcing programmers to use parentheses if they need the comparator in expressions...

Java and C++ cannot be correctly parsed by Yacc and Bison that offer no clean way to resolve shift-reduce ambiguities by "callouts" to the semantic analyser. But Java and C++ are correctly parsed by ANTLR (which can also automatically build the abstract syntax tree, integrates most features allowed by regular expressions (without needing a separate lexer), and which is then not limited to LALR or LR(n) languages, and which includes callouts that can be also written and chained with another ANTLR parser, e.g. for code generators and transforms of the abstract syntax tree.

In Lua, there's still no such syntaxic ambiguity requiring semantic analysis to resolve them so a simple LALR parser can be used (ANTLR can still be used, it offers additional benefits in terms of semantic analysis, transforms of the AST, data path analysis, reduction/factorisation of common parts of subbranches, detection and rearrangement of common expressions, detection and precompilation of constant expressions, elimination of "dead" code, canonicalization of boolean expressions in normal form also to detet common branches and refactoring the code, native code generation, optimization of native code when there are several competing representations with more or less compaction, register allocation, reduction of sets of local variables, management of dependencies across variables, reduction of effective scope of variables, conditional binding with a standard library to precompute some values or further improve the generated code for runtime, possibility of generating several codes that will be selected later by profiling info, possibly even with runtime profiling and recompilation wiht branch prediction...)

If you still don't know ANTLR, look at it! Along with PCRE for regular expressions, it's a fabulous piece of sofware for the design and implementation of any programming language, and that has brought Lex/Yacc/Bison to antiquities.

Le sam. 23 mai 2020 à 06:06, Philippe Verdy <verdyp@gmail.com> a écrit :
I would favor the "!" prefix because it would be very easy to type. the "!" would be followed by a keyword reserved by the language, or even no keyword at all for "indexing" the metatable field of objects.

So "t[!]" would be valid and would replace "getmetatable(t)".
* "t[!].x = 1" would replace "if not getmetatable(t) then setmetatable(t,{}) end; getmetatable(t).x=1" 
* "t[!][10] = 1"  would replace "if not getmetatable(t) then setmetatable(t,{}) end; getmetatable(t)[10]=1" 
* "t = { 10, x = 12; != {'a'} }" would replace  "t = {10, x = 12}; setmetatable(t, {'a'})" in table constructors (note: "!=" is still parsed as if it was two lexical units "!" and "=", a space is still allowed between them, even if "!=" is a today distinct lexical item also used as a binary operator of the language, but without any syntaxic ambiguity here)

[ C++ has made some compromizes with "<<" and ">>" for the notation of parameters of templates, but in some cases you need an extra space between the "<<" and ">>" to avoid the ambiguity with binary shift operators in expressions: this use of "<<" and ">>" for template parameters seriously complicates C++ syntaxic parsers, as the syntax is not context-free and requires backtracking: C++ is not a pure LALR or LR(n) language: try parsing C++ with a simple tool like Yacc or Bison, it does not work without complex error tracking and lot of shift-reduce ambiguities to solve, making the syntax very hard to maintain for very basic evolutions. ]

And we can declare the table constant as well (i.e. its member list is not extensible): " t = !const{ 10, x = 12 }
Or we can protect some members from being changed:  "t = { !const 0, x = 12 }" to protect the member at index [1]
As well named members can be seen as variables:  "t = { 0, !const x = 12 }"
And so we can also declare constant variables:  "!const t = { 0, !const x = 12 }" (where "!const" replaces the "local" keyword)

Other "!keywords" can be defined at any time in newer versions of Lua without breaking existing Lua programs.



Le sam. 23 mai 2020 à 05:31, Philippe Verdy <verdyp@gmail.com> a écrit :
Annd what do you think about new keywords added in C++ like "class, public, private, protected, throw, catch": older C programs would no longer compile if these were variables. And most compilers have introduced their own keywords (sometimes with the convention of using "__" prefixes that should be reserved for keywords of the language).
Why did not Lua enforce that the "__" prefix should be reserved to core language extensions ?
It would have been easier to extend new keywords like "__const". But unfortunately, Lua has no naming convention at all for the allowed identifiers used in programs.

Another way to extend the set of keywords for future versions of Lua (while keeping existing Lua programs valid) would be to use a prefix that is currently forbidden for identifiers and not used by existing operators in expressions where identifiers would be used: e.g. "#" or "\" or "?" or "!"
Note that Java chose to use "@" (not for new lexical elements but for adding annotations in a set which is extensible by programs)

So instead of "const", why not "#const" or "\const", or "@const", or "?const", or "!const", if Lua introduces annotations but with a convention on naming (such as reserving lowercase ASCII initials a-z for core annotations defined by the semantics of the language, but still allowing programmers use all the annotations they want by using capital initials after the [#\@!?] prefix, or even an "_" initial if they wish, or or non-ASCII letters if the syntaxic parsers allows UTF-8 encoding for source files)


Le mer. 20 mai 2020 à 09:52, pocomane <pocomane_7a@pocomane.com> a écrit :
On Tue, May 19, 2020 at 11:04 PM Andrea <andrea.l.vitali@gmail.com> wrote:
> So let's say we have decided to add the new keyword "const" (or something similar, it can be "local_const")
> In the old program this new keyword is not used, therefore the old program will be able to run on the new Lua 5.4 interpreter.
> And the new program will benefit from a cleaner syntax.
> Let me know if my understanding is correct.

Nope. The point is: an old script must run on the new lua without modification.

If you introduce a new keyword "const", the following script will not
work anymore in the new lua:

local const = 2
print(const)