lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 6/27/2017 6:56 PM, Duane Leslie wrote:
Hi,

I had a problem with the quoted string format producing strings that
were not legal UTF-8 because it was not escaping non-ascii characters,
and then once I fixed that I wasn't able to read the strings back in
to a C program because C uses octal escapes and Lua uses decimal.

But string.format is not documented to produce a legal C string literal. See

   https://www.lua.org/manual/5.3/manual.html#pdf-string.format

where it says "The q option formats a string between double quotes, using escape sequences when necessary to ensure that it can safely be read back by the Lua interpreter." Note that it explicitly does not mention that the result could be safely read by a C compiler.

Within string literals, Lua is already perfectly fine with UTF-8 content. It may also be fine with other extended ASCII forms, or with some other Unicode translation formats, leaving most of those details up to the system outside of Lua. But don't use UTF-7. Just don't.
This patch ensures all control and non-ascii characters are escaped,
and uses the hexadecimal escape syntax instead of decimal to ensure
compatibility between Lua and C.

Here you are using "Lua" to mean Lua 5.2 or later. The still widely used Lua 5.1 did not support hex escapes.

Technically it is still not safe to pass the strings as literals
directly into C because in C the hexadecimal production is not
automatically terminated at two characters but I figured this was
outside of the scope of the quoted string format specifier.  I solve
this instead by using `:gsub([[%f[\]\x%x%x]],'%0""')` to terminate the
hexadecimal escapes (triggering C's string literal concatenation
behaviour) at the point of export.

You would be far better served by writing a Lua function that generates a proper C string literal and calling that instead of depending on string.format("%q") and additional processing.

Lua is designed to work well with C. It is also designed to be used by people who don't want to know anything about C or lower level programming issues.  While the choice of base-10 for \ddd escapes is occasionally a source of friction when switching back and forth between the languages, it is no worse that the choice of 1-based array indexing and numerous other details that differ.
-- 
Ross Berteig                               Ross@CheshireEng.com
Cheshire Engineering Corp.           http://www.CheshireEng.com/
+1 626 303 1602