Re: Stripping HTML tags

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Stripping HTML tags
From: Rici Lake <lua@...>
Date: Mon, 15 Aug 2005 15:16:32 -0500

(Please don't reply to messages when you're starting a new thread. It'sconfusing.)


On 15-Aug-05, at 2:44 PM, Florian Berger wrote:

I thought that stripping HTML tags was easy until I saw something likethis:
<a href="http://www.example.com"; alt="> example"> example </a>

That would be non-trivial to handle with a regular expression, althoughI think it is possible.

However, you would have quite a bit of trouble with some otherlegitimate HTML constructions, particularly comments () and embedded javascript. If you want abullet-proof html parser, you should probably use a tokenizer.

s = string.gsub(s, '<.->', ' ')

This might prove to be a bit faster, but it would fare no better withthe alt=">.." example:


s = string.gsub(s, "%b<>", " ")

I'm not convinced by the substitution of a tag with a space, though.The following sequence is not rendered with a space:<b>over</b>-specified

Follow-Ups:
- Re: Stripping HTML tags, Chris Marrin

Prev by Date: Re: Packaging and importing
Next by Date: Re: Packaging and importing
Previous by thread: Re: LuaProfiler on MacOSX?
Next by thread: Re: Stripping HTML tags
Index(es):
- Date
- Thread