lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


An interesting related document about ISO 646-ISV and the solution used in C/C++ (and their preprocessors)
https://en.wikibooks.org/wiki/C_Programming/C_trigraph

C99 had to define trigraphs using "??" prefixes, then as some of them were judged too long and difficult to read, it added additional digraphs for replacing brackets, braces and '#' everywhere in preprocessors directives.
   <: :> <% %> %:
The trigrams and digrams may sometime require sometimes spacing before after them in some constructs, or escaping with \ when they occur in string literal constants...

I wonder why C/C++ did not reserve some digram for their own reserved extensions instead of polluting the namespace of programmable identifiers (even if C++ added namespaces with the new token "::", these nemsapces are still defined in user space, including "std"; but not all C/C++ compilers accept digrams/trigrams, unless there's a special compilation option which is off by default as most programs don't need them if sources support the full ASCII range and more! They were not even needed for EBCDIC support, only for very old systems with limited 7-bit only terminals using legacy ISO 646 variants or old EBCDIC punchcards with reduced keyboards for punching them).

But ISO 646-ISV is still relevant for users of international keyboards where some ASCII characters are not easy to enter (not all programmers have an US keyboard, but today all programmers should have an extended keyboard with at least an AltGr key, or additional keys for the missing ASCII puntuations in classic layouts).

If you program C, C++, Java, _javascript_ with a French keyboard, you use the AltGr key very often for all ASCII brackets, braces, sharps, vertical bars, tilde and backslashes (also for caret but its usage is more exceptional): 11 ASCII characters require the AltGr, but French programmers are used to it, or they have opted for connecting an additional US keyboard or a smaller 4x3 side keyboard with extra keys for programming (sometimes with several working modes: calculator/numpad or ASCII+Euro). There even exists now extended keyboards with an additional row of keys prelabelled for a dozen of ASCII characters plus the Euro symbol (so that no small screen is required to show the mapping: users can configure the keys by using keycaps designed for classic US keyboards and then configure the mapping in settings of a device driver, saved in their preferences).
Some APL programmers use their own layout with keyboard configured this way with solid keycaps (ugly stickers don't resist very long).

But source files in most C/C++ projects are no longer accepting trigrams/digrams in their repositories (and they are not supported in Java and _javascript_, only some implementations of C99). There are tools in *nix to convert source files from/to C trigrams/digrams where needed (but few users use or even install these optional tools).



Le lun. 25 mai 2020 à 15:11, Philippe Verdy <verdyp@gmail.com> a écrit :
Arg! Yes but not within local declarations where such calls is illegal.
So we need another extension character for general extension purpose

Other punctuation characters in ASCII already have a meaning (and also used in digrams for other tokens):
 " # ' % ( )  * + , - . / : ; ^ < = > [ ] { } ~

There remains in the ASCII space only (still more than in C/C++/Java):
 ! $ & ? @ \ ` |

See:
 local x const! = 0; (because others don't like local x !const = 0;)
 local x $const = 0; (is the $ character allowed in identifiers? it seems not"Names (also called identifiers) in Lua can be any string of letters, digits, and underscores, not beginning with a digit.", but it is not part of ISO 646-IVS)
 local x &const = 0; or local x const& = 0; (not easy to type on many keyboards, and not part of ISO 646-IVS)
 local x ?const = 0; or  local x const? = 0;
 local x @const = 0; (though I would prefer keeping the @ for user-provided attributes and annotations, and it's not easy to type on many keyboards, and not part of ISO 646-IVS)
 local x \const = 0; (not easy to type on many keyboards, and not part of ISO 646-IVS)
 local x `const = 0; (not easy to type on many keyboards, and not part of ISO 646-IVS)
 local x |const = 0; (not easy to type on many keyboards, and not part of ISO 646-IVS)

The best candidates for general extension of the Lua language itself are then limited to '!' or '?'

If the prefix notation is prefered, then the best candidate is then '?'
  local x?const,  file?toclose, t = 10, open('filename'), [];

And for metatables pseudo-keys in table constructors or in get/set operations (with get still returning nil, but set creating a empty metatable when there's still none before setting a key):
  t = [10, 11, ? = [ type = 'something'] ]
  t[?].type2 = (t[?].type or nil) ..  ' else'

And for envuironment settings (replacing calls to fgetenv/fsetenv):
  fun?env = ...

The extensions decribed with the ? prefix allow the following extended tokens: ?, ?const, ?toclose, ?env, and many more for future versions of Lua, including additional operators like:

 ?<< or ?>> for binary rotations (could be named ?rol and ?ror instead)
 ?/ and ?% for modular euclidian divisions so that the modulo always has the same sign as the divisor and that also preserve the precision,
    where a?/b=floor(a/b) and a?%b= a-floor(a/b)*b (these could be named ?div and ?mod)
 (... ?? ... ?: ...) for a ternary conditional operator (these could be named ?if and ?else)
 ( ?| eval ?| cond1 ?|cond2 = value1 ?default = value2) (these could be named ?switch, ?case, ?default)

The extensions should be simple to parse: the ? prefix would be followed by another token (using the existing token syntax, possibly extended with a few operators that may be added like ` or |), but probably without allowing any space between them; the main difference is that all identifiers that follow the ? are reserved for the language and not used for any variable name declared in user-space.

User-annotations using @ are still a common convention and no one complained if they had to type it with AltGr+key.

But pairs with '<' and '>' are among the worst as it has limited usage in specific syntaxic constructs (and cause difficulties in parsers with shift-reduce problems).


Le lun. 25 mai 2020 à 11:43, pocomane <pocomane_7a@pocomane.com> a écrit :
On Mon, May 25, 2020 at 10:21 AM Ray <emacsray@gmail.com> wrote:
> On every discussion thread I can find about 5.4 <const> <close>, there
> are great piles of complaints about the horrible syntax.
> Hope the syntax can still be fixed, otherwise it will assuredly hamper
> the adoption.

Not for 5.4, I do not think so.

> Oh, does `x:const = 1` imply `local`? This syntax seems nice!

Yes but it is not possible since it already means something in lua
(call the const method of x). I did not realize this when I made my
previous examples, sorry.