lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

I currently have the need to strip HTML tags from a given Lua string,
ideally allowing a specific subset (such as <p>, <b>, etc.).

I wrote an XSS filter a few weeks ago, actually, for use in Sputnik (but written as a stand-alone module).  I hadn't checked it in earlier, but it is now in the repository, cleaned up and documented:

I can package it as a rock if there is interest.

To avoid having second-guessing how different clients handle strange HTML input, the filter assumes that the input is subset of valid XML and returns nil if the input doesn't parse.  (I do the parsing using Roberto's function from the LuaXML page on the wiki.)  If the input does parse, we then traverse the tree and check each element and its attributes against a configuration table, replacing anything that's not allowed with a message.  There is a default configuration table that shoots for a balance between security and features, but the client can either supply their own or modify the default.  So, you can adjust it to make it either more liberal or more conservative.

The filter is confugured (by default) to allow all tags that Markdown generates, so to use it with Markdown run the input through Markdown _first_, then filter it.

<a href="" _onClick_="<script src="">

xssfilter will reject this input by default, as it disapproves of unescaped "<" in attribute values.  If you try the more kosher

    <a href="" _onClick_="&lt;script src="">
then it will include the link but strip out the "onClick" attribute - not because of "script" in it, but simply because it only allows <a> to have href, class, alt, and title attributes.  It will also strip out "href", since it wants to see one of "safe" protocol prefixes.  You can change both:

    local xss_filter =
    xss_filter.allowed_tags.a.href="" -- allow any value for href
    xss_filter.allowed_tags.a._onClick_="." -- allow onClick attribute, with any value
    local safe_html, message = xss_filter:filter(my_unsafe_html)

- yuri