[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: htmlify a string?
 
- From: Philippe Lhoste <PhiLho@...>
 
- Date: Fri, 21 Oct 2005 07:06:45 +0200
 
Rici Lake wrote:
I can't help thinking that all the proposed solutions are a lot more 
complicated than necessary.
So are your, below, when you extend the functionnalities... :-)
Also, this is a perfect use case for Mike Pall's patch to string.gsub, 
with which I concur (although I might extend it a bit....)
I agree.
Anyway, here's the simplest html escaper I know of (just the three 
vital characters):
do
   local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
   local function escape(c) return escapes[c] end
   function html_escape(str) return (str:gsub("[&<>]", escape)) end
end
Indeed, but if you extend the list of escape chars, you have to extend 
the regular expression, which may be prone to error later.
One of the "complexities" of my solutions was to build this RE 
automatically.
Can't get much simpler than that, except that with Mike's patch you 
wouldn't need the function "escape"; you could just provide the table 
"escapes" as the last argument to gsub. (By the way, the redundant 
parentheses in the last return statement are deliberate; they avoid 
returning the second return value of gsub.)
If the string to be escaped is ISO-8859-1, and you really want to 
escape high-ascii numerically, just extend the escapes table:
do
   local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
   for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
   local function escape(c) return escapes[c] end
   function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) 
end
end
If you really want named escapes, insert them after the for loop, but I 
don't see the point; with numeric escapes you don't need to worry about 
browser support.
Some recent entities, like €, may not be known of old browsers. 
Using the named entity allows, at least, the user to see the € 
string, which is easier to understand that the numeric entity.
However:
do
   local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
   for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
   escapes['á'] = "á"
   -- etc.
   local function escape(c) return escapes[c] end
   function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) 
end
end
Perhaps it is better to just output straight in UTF-8:
do
   local escapes = {["&"] = "&", ["<"] = "<", [">"] = ">"}
   for i = 128, 191 do escapes[string.char(i)] = "\194"..string.char(i) 
end
   for i = 192, 255 do escapes[string.char(i)] = 
"\195"..string.char(i-64) end
   local function escape(c) return escapes[c] end
   function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) 
end
end
In no case should it be necessary to scan the string more than once.
Indeed.
Good solutions, as usual...
--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --