[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Heritage of Lua syntax for multi-line strings and comments
- From: Tom Sutcliffe <tomsci@...>
- Date: Fri, 24 Nov 2023 11:22:10 +0000
On 23 Nov 2023, at 23:11, Claudio Grondi <claudio.grondi@freenet.de> wrote:
>
> What I am wondering about is if there is a good eason I am not aware of for using:
> `--[===[ third level styled text comment ]===]`
> instead of:
> `--[[[[[ third level styled text comment ]]]]]`
> or
> `--['''[ third level styled text comment ]''']`
> or
> `--[...[ third level styled text comment ]...]`
Disclaimer: I won't put words into the mouths of the Lua team, these are my (un)educated guesses as to why things are the way they are.
My guess for not using the 2nd option (multiple brackets) is that there are a lot of ambiguities in that syntax if you actually were to try and implement it. For example, how do you express a long string/comment that starts or ends with a square bracket? If there are 4 opening brackets but 3 closing, does that mean you have to adjust your parse to treat the 4th opening bracket as part of the comment? That gets ugly fast.
As for why the equals sign instead of single quote or period? I don't know, maybe there's no particular reason because there are no strong pros or cons either way. Maybe there's a historical precedent in another previous language that I'm not aware of, maybe it's an ascii-art nod to the equals signs 'stretching' between the two brackets? I vaguely recall the double dash to introduce a comment came from something else although the only example I can think of offhand is AppleScript.
Conciseness probably wasn't the primary consideration for Lua's syntax, not if it makes the parsing substantially more complex, or it introduces ambiguities into the syntax. There are a lot of clever nuances to Lua's syntax that most people probably won't catch unless they've tried to write a parser. The fact that you always know what the end delimiter is once you've parsed the start delimiter without any backtracking. That parsing the start delimiter is unambiguous. That they can be nested arbitrarily without accidentally closing the string on one of the inner end delimiters. That end delimiters for other levels can form the end of the string contents without introducing ambiguity or requiring escaping. That you only need a single int (or uint8 etc) to track the current nesting level.
It's a really elegant solution and almost every other language out there seems to have either too few literal string syntaxes, or too many (looking at you Groovy, you kitchen-sink abomination). Or in the case of C++ make a complete dog's breakfast of a long string syntax, I write C++ every day and still cannot get its long string syntax to stick in my brain (or make sense).
Coming up with new syntax is an fun thought experiment, but the need for simple parsing and potential for ambiguity are key considerations :) Markdown is wonderfully simple on first glance and a complete and utter nightmare to parse, for example.
Cheers,
Tom