[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Significant newlines
- From: "Soni L." <fakedme@...>
- Date: Fri, 2 Sep 2016 13:48:59 -0300
On 02/09/16 06:58 AM, Viacheslav Usov wrote:
The complete syntax of Lua in the reference manual uses
/LiteralString/, whose description is said to be given in 3.1.
Yes. And I've wanted to use (numeric) escapes (e.g. \0 or something) in
multiline strings before. And I decided to use something like this:
3.1 says: "Any byte in a literal string not explicitly affected by the
previous rules represents itself." The newline byte (decimal value 10
in ASCII and UTF-8) is explicitly mentioned in three places in the
(1) A backslash followed by a real newline results in a newline in the
(2) The escape sequence '\z' skips the following span of white-space
characters, including line breaks
(3) Any kind of end-of-line sequence (carriage return, newline,
carriage return followed by newline, or newline followed by carriage
return) is converted to a simple newline.
(3) is given only in the context of long strings. Therefore, in a
literal string delimited by single or double quotes, the newline byte
has a special meaning only when it follows the backslash or is within
a span of white-space characters following \z; and it represents
itself otherwise. The same is true for the carriage return bytes
(decimal value 13 in ASCII and UTF-8). So, per "the official
definition of the Lua language" newlines and carriage returns should
just work without being escaped in either kind of string, albeit with
subtle platform-dependent differences in non-long strings.
But we know that the official implementation produces error
"unfinished string" when a non-long string has a newline. So at least
one of the two is wrong and ought to be fixed.
Personally, I see no reason why the implementation cannot treat
newline and carriage return bytes as described in the manual. As far
as I can see, llex.c already has a special case just to emit that
lexerror(ls, "unfinished string", TK_STRING);
That special case code can be trivially changed to match the
officially description. We can also define the behaviour of newline
and carriage returns to be the same in both kinds of string literals,
thus eliminating the platform-dependent differences mentioned above;
then it is even more trivial to change that special case (this is
because (1) given above is not followed exactly by the implementation,
which handles not just "a real newline" but any of the four possible
\n and \r combinations, apparently to eliminate those same
The "typo" arguments given earlier, I do not find them convincing. The
treat-unknown-as-global rule is far more dangerous when it comes to
types, yet we somehow live with that.
But fundamentally, we should eliminate the discrepancy in /some way.
/Saying one thing and doing something completely different is bad.
(but am too lazy to find the actual code now...)
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.