[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: htmlify a string?
- From: Philippe Lhoste <PhiLho@...>
- Date: Fri, 21 Oct 2005 07:06:45 +0200
Rici Lake wrote:
I can't help thinking that all the proposed solutions are a lot more
complicated than necessary.
So are your, below, when you extend the functionnalities... :-)
Also, this is a perfect use case for Mike Pall's patch to string.gsub,
with which I concur (although I might extend it a bit....)
I agree.
Anyway, here's the simplest html escaper I know of (just the three
vital characters):
do
local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>]", escape)) end
end
Indeed, but if you extend the list of escape chars, you have to extend
the regular expression, which may be prone to error later.
One of the "complexities" of my solutions was to build this RE
automatically.
Can't get much simpler than that, except that with Mike's patch you
wouldn't need the function "escape"; you could just provide the table
"escapes" as the last argument to gsub. (By the way, the redundant
parentheses in the last return statement are deliberate; they avoid
returning the second return value of gsub.)
If the string to be escaped is ISO-8859-1, and you really want to
escape high-ascii numerically, just extend the escapes table:
do
local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape))
end
end
If you really want named escapes, insert them after the for loop, but I
don't see the point; with numeric escapes you don't need to worry about
browser support.
Some recent entities, like €, may not be known of old browsers.
Using the named entity allows, at least, the user to see the €
string, which is easier to understand that the numeric entity.
However:
do
local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
escapes['á'] = "á"
-- etc.
local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape))
end
end
Perhaps it is better to just output straight in UTF-8:
do
local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
for i = 128, 191 do escapes[string.char(i)] = "\194"..string.char(i)
end
for i = 192, 255 do escapes[string.char(i)] =
"\195"..string.char(i-64) end
local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape))
end
end
In no case should it be necessary to scan the string more than once.
Indeed.
Good solutions, as usual...
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --