lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 02/09/16 06:58 AM, Viacheslav Usov wrote:
The complete syntax of Lua in the reference manual uses /LiteralString/, whose description is said to be given in 3.1.

3.1 says: "Any byte in a literal string not explicitly affected by the previous rules represents itself." The newline byte (decimal value 10 in ASCII and UTF-8) is explicitly mentioned in three places in the preceding text:

(1) A backslash followed by a real newline results in a newline in the string.

(2) The escape sequence '\z' skips the following span of white-space characters, including line breaks

(3) Any kind of end-of-line sequence (carriage return, newline, carriage return followed by newline, or newline followed by carriage return) is converted to a simple newline.

(3) is given only in the context of long strings. Therefore, in a literal string delimited by single or double quotes, the newline byte has a special meaning only when it follows the backslash or is within a span of white-space characters following \z; and it represents itself otherwise. The same is true for the carriage return bytes (decimal value 13 in ASCII and UTF-8). So, per "the official definition of the Lua language" newlines and carriage returns should just work without being escaped in either kind of string, albeit with subtle platform-dependent differences in non-long strings.

But we know that the official implementation produces error "unfinished string" when a non-long string has a newline. So at least one of the two is wrong and ought to be fixed.

Personally, I see no reason why the implementation cannot treat newline and carriage return bytes as described in the manual. As far as I can see, llex.c already has a special case just to emit that error message:

case '\n':
case '\r':
        lexerror(ls, "unfinished string", TK_STRING);


That special case code can be trivially changed to match the officially description. We can also define the behaviour of newline and carriage returns to be the same in both kinds of string literals, thus eliminating the platform-dependent differences mentioned above; then it is even more trivial to change that special case (this is because (1) given above is not followed exactly by the implementation, which handles not just "a real newline" but any of the four possible \n and \r combinations, apparently to eliminate those same platform-dependent differences).

The "typo" arguments given earlier, I do not find them convincing. The treat-unknown-as-global rule is far more dangerous when it comes to types, yet we somehow live with that.

But fundamentally, we should eliminate the discrepancy in /some way. /Saying one thing and doing something completely different is bad.

Cheers,
V.

Yes. And I've wanted to use (numeric) escapes (e.g. \0 or something) in multiline strings before. And I decided to use something like this:

[[some
stuff
here]].."\0"..[[more
stuff
here]]

(but am too lazy to find the actual code now...)

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.