lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I can't help thinking that all the proposed solutions are a lot more complicated than necessary.

Also, this is a perfect use case for Mike Pall's patch to string.gsub, with which I concur (although I might extend it a bit....)

Anyway, here's the simplest html escaper I know of (just the three vital characters):

do
  local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
  local function escape(c) return escapes[c] end
  function html_escape(str) return (str:gsub("[&<>]", escape)) end
end

Can't get much simpler than that, except that with Mike's patch you wouldn't need the function "escape"; you could just provide the table "escapes" as the last argument to gsub. (By the way, the redundant parentheses in the last return statement are deliberate; they avoid returning the second return value of gsub.)

If the string to be escaped is ISO-8859-1, and you really want to escape high-ascii numerically, just extend the escapes table:

do
  local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
  for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
  local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) end
end

If you really want named escapes, insert them after the for loop, but I don't see the point; with numeric escapes you don't need to worry about browser support. However:

do
  local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
  for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
  escapes['á'] = "&aacute;"
  -- etc.
  local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) end
end

Perhaps it is better to just output straight in UTF-8:

do
  local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
for i = 128, 191 do escapes[string.char(i)] = "\194"..string.char(i) end for i = 192, 255 do escapes[string.char(i)] = "\195"..string.char(i-64) end
  local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) end
end

In no case should it be necessary to scan the string more than once.

R.