[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Stripping HTML tags
- From: PA <petite.abeille@...>
- Date: Tue, 16 Aug 2005 07:32:27 +0200
On Aug 15, 2005, at 21:44, Florian Berger wrote:
I thought that stripping HTML tags was easy until I saw something like
this:
<a href="http://www.example.com" alt="> example"> example </a>
Argh! Nasty, nasty HTML! 8^)
Perhaps you could try LUXMLInputStream:
http://dev.alt.textdrive.com/file/lu/LUXMLInputStream.lua
Usage example:
local aContent = "<a href=\"http://www.example.com\" alt=\"> example\">
example </a>"
local anInputStream = LUXMLInputStream( aContent )
for aType, aText, aName, someAttributes in anInputStream:iterator() do
if aType == LUXMLInputStream.Text then
print( aType, aText )
elseif someAttributes ~= nil then
print( aType, aName, someAttributes )
else
print( aType, aName )
end
end
> 1 a { href = http://www.example.com }
> 3 example"> example
> 2 a
Or if you simply want the textual part:
for aText in anInputStream:iterator( true ) do
print( aText )
end
> example"> example
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/