lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

It has been mentioned before [1] that the Lua 5.1 implementation
treats unrecognized escape sequences as the character after the
backslash, although this behavior is undefined in the reference

  assert( "\;" == ";")
  assert( "\x10" == "x10" ) -- warning: planned to change in 5.2

I've noticed a number of Lua files that rely on this undefined
behavior.  This has caused a bit of an issue when using Metalua to lex
these files because Metalua currently treats this as a syntax error,
as it is allowed to do so.  Based on an analysis of the LuaDist
repository [2], I estimate that approximately 1% of published Lua
files rely on this undefined behavior (Appendix A and B).

So, I wonder if it would be better for Lua 5.2 to define how
unrecognized escape sequences should be treated.

A similar thing might be said about unrecognized pattern characters
(e.g. "%e").  I didn't see any instances of that, but it's hard so say
whether a given string is a pattern without doing a dynamic analysis.
Presently, the Lua 5.1 implementation treats "%e" as "e", but this
behavior seems undefined.

BTW, a couple tangent notes about the analysis: originally I tried
ltokens for this, but ltokens converts escape sequences before
returning them.  Its returning of only line numbers rather than
character positions also limits its usefulness to me in general.
Finally, in using 'lua test.lua `find .`', you can in 5.1 get the
error "lua: stack overflow (too many arguments to script)" as
mentioned in [3], but 5.2.0-work4 seems much more permissive.


== Appendix A: Test Code ==

-- findescape.lua
-- usage: lua findescape.lua `find /tmp/Repository -name '*.lua'`
-- uses:
local LB = require "luabalanced"

local function readfile(filename)
  local fh =, 'r')
  local text; if fh then text = fh:read'*a':gsub('\r','\n') end
  return text

local nfiles = 0
local nbadfiles = 0

for _, filename in ipairs{...} do
  local isbad
  local text = readfile(filename)
  local ok, err = pcall(function()
    LB.gsub(text, function(u, text)
      if u == 's' and not text:match'^%[' then
        for pos, c in text:gmatch('()\\(.)') do
          if not c:match'[abfn\nrtv\"\'\\0123456789]' then
            local formatvalue = (#text <= 100 and text or
              '...' .. text:sub(pos,pos+100) .. '...'):gsub('\n', '\\n')
            print(c, filename, formatvalue)
            isbad = true
  if not ok then print('error', filename, err) end
  if isbad then
    nbadfiles = nbadfiles + 1
  nfiles = nfiles + 1

print(nbadfiles, ' of ', nfiles)

== Appendix B: Results ==

.	/tmp/Repository/alien/src/alien.lua	'\.so[^() ]*'
?	/tmp/Repository/cgilua/src/cgilua/authentication.lua	"\?"
?	/tmp/Repository/cgilua/src/cgilua/authentication.lua	"\?"
o	/tmp/Repository/leg/tests/test_scanner.lua	"[[something\n\or\nanother]]"
%	/tmp/Repository/luadate/date.lua	"^(%d+)[/\%s,-]?%s*"
%	/tmp/Repository/luadate/date.lua	"^(%a+)[/\%s,-]?%s*"
 	/tmp/Repository/luagraph/examples/record1.lua	"<f0> left|<f1> mid\
dle|<f2> right"
 	/tmp/Repository/luagraph/examples50/record1.lua	"<f0> left|<f1> mid\
dle|<f2> right"
z	/tmp/Repository/luajson/lua/json/decode/strings.lua	'\z'
?	/tmp/Repository/luasocket/samples/lpr.lua	"[%s%c%p]*([%w]*)=([\"]?[%w%s_!@#$%%^&*()<>:;]+[\"]\?\.?)"
.	/tmp/Repository/luasocket/samples/lpr.lua	"[%s%c%p]*([%w]*)=([\"]?[%w%s_!@#$%%^&*()<>:;]+[\"]\?\.?)"
error	/tmp/Repository/luma/samples/nor.lua	./luabalanced.lua:33: syntax error
error	/tmp/Repository/luma/tests/nor.lua	./luabalanced.lua:33: syntax error
l	/tmp/Repository/penlight/lua/pl/sip.lua	'return
(function(s,res)\n\t\local %s = s:match(%q)\n'
/	/tmp/Repository/remdebug/src/remdebug/engine.lua	"[^\/]+"
_	/tmp/Repository/sputnik/lua/sputnik/markup/markdown.lua	"\_"
[	/tmp/Repository/toluapp/src/bin/lua/declaration.lua	"(%b\[\])"
]	/tmp/Repository/toluapp/src/bin/lua/declaration.lua	"(%b\[\])"
.	/tmp/Repository/toluapp/src/bin/lua/feature.lua	"[<>:, \.%*&]"
13	 of 	1539