I wrote an XSS filter a few weeks ago, actually, for use in Sputnik
(but written as a stand-alone module). I hadn't checked it in earlier,
but it is now in the repository, cleaned up and documented:
http://gitorious.org/projects/sputnik/repos/mainline/blobs/master/xssfilter/lua/xssfilter.lua
I can package it as a rock if there is interest.
To avoid having second-guessing how different clients handle strange
HTML input, the filter assumes that the input is subset of valid XML
and returns nil if the input doesn't parse. (I do the parsing using
Roberto's function from the LuaXML page on the wiki.) If the input
does parse, we then traverse the tree and check each element and its
attributes against a configuration table, replacing anything that's not allowed with a message. There is a default
configuration table that shoots for a balance between security and
features, but the client can either supply their own or modify the default. So, you can adjust it to make it either more liberal or more conservative.
The filter is confugured (by default) to allow all tags that Markdown
generates, so to use it with Markdown run the input through Markdown
_first_, then filter it.
<a href="" _onClick_="<script src="">
xssfilter will reject this input by default, as it disapproves of unescaped "<" in attribute values. If you try the more kosher
<a href="" _onClick_="<script src="">
then it will include the link but strip out the "onClick" attribute - not because of "script" in it, but simply because it only allows <a> to have href, class, alt, and title attributes. It will also strip out "href", since it wants to see one of "safe" protocol prefixes. You can change both:
local xss_filter = xssfilter.new()
xss_filter.allowed_tags.a.href="" -- allow any value for href
xss_filter.allowed_tags.a._onClick_="." -- allow onClick attribute, with any value
local safe_html, message = xss_filter:filter(my_unsafe_html)
- yuri