Re: htmlify a string?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: htmlify a string?
From: Philippe Lhoste <PhiLho@...>
Date: Fri, 21 Oct 2005 07:06:45 +0200

Rici Lake wrote:

I can't help thinking that all the proposed solutions are a lot morecomplicated than necessary.


So are your, below, when you extend the functionnalities... :-)

Also, this is a perfect use case for Mike Pall's patch to string.gsub,with which I concur (although I might extend it a bit....)


I agree.

Anyway, here's the simplest html escaper I know of (just the threevital characters):


do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
   local function escape(c) return escapes[c] end
   function html_escape(str) return (str:gsub("[&<>]", escape)) end
end

Indeed, but if you extend the list of escape chars, you have to extendthe regular expression, which may be prone to error later.One of the "complexities" of my solutions was to build this REautomatically.

Can't get much simpler than that, except that with Mike's patch youwouldn't need the function "escape"; you could just provide the table"escapes" as the last argument to gsub. (By the way, the redundantparentheses in the last return statement are deliberate; they avoidreturning the second return value of gsub.)
If the string to be escaped is ISO-8859-1, and you really want toescape high-ascii numerically, just extend the escapes table:
do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
   for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
   local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape))end
end
If you really want named escapes, insert them after the for loop, but Idon't see the point; with numeric escapes you don't need to worry aboutbrowser support.

Some recent entities, like €, may not be known of old browsers.Using the named entity allows, at least, the user to see the €string, which is easier to understand that the numeric entity.

However:

do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
   for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
   escapes['á'] = "&aacute;"
   -- etc.
   local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape))end
end

Perhaps it is better to just output straight in UTF-8:

do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
for i = 128, 191 do escapes[string.char(i)] = "\194"..string.char(i)endfor i = 192, 255 do escapes[string.char(i)] ="\195"..string.char(i-64) end
   local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape))end
end

In no case should it be necessary to scan the string more than once.


Indeed.
Good solutions, as usual...

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --

References:
- htmlify a string?, Robert Raschke
- Re: htmlify a string?, Philippe Lhoste
- Re: htmlify a string?, Rici Lake

Prev by Date: Re: Statically linking luasocket
Next by Date: Re: lua internals debugging hints? : lua_number2integer() not working
Previous by thread: Re: htmlify a string?
Next by thread: LuaInterface and debugging
Index(es):
- Date
- Thread