lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Apr 21, 2008 at 3:05 AM, Bertrand Mansion <golgote@mamasam.com> wrote:
>
> Le 21 avr. 08 à 01:06, Jim Whitehead II a écrit :
>
>
> > I currently have the need to strip HTML tags from a given Lua string,
> > ideally allowing a specific subset (such as <p>, <b>, etc.).  There
> > are a number of implementations of this, a PHP version in particular:
> >
> > http://uk2.php.net/strip_tags
> >
> > Does anyone have something like this in Lua, or some example LPEG code
> > for a specific tag that I could use?  A naive solution is relatively
> > simple using patterns matching, but I'd like to be able to handle odd
> > cases like this:
> >
> > <a href="blah" onClick="<script src='foo'></script>">Link</a>
> >
> > I'd like to avoid stripping the <script> tag in this case, since it
> > occurs as an attribute of another tag.
> >
>
> Either you strip tags or you don't. Since <script> is inside <a>, if you
> strip <a>, you strip <script> at the same time.

Actually, that isn't the case.  Using an XML parser you can absolutely
strip one and not the other, because the "tag" inside the attribute
isn't a tag at all.  With a proper ruleset you can actually distill
things down to a point where you have what you need.