lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Aug 15, 2005, at 21:44, Florian Berger wrote:

I thought that stripping HTML tags was easy until I saw something like this:
<a href="http://www.example.com"; alt="> example"> example </a>

Argh! Nasty, nasty HTML! 8^)

Perhaps you could try LUXMLInputStream:

http://dev.alt.textdrive.com/file/lu/LUXMLInputStream.lua

Usage example:

local aContent = "<a href=\"http://www.example.com\"; alt=\"> example\"> example </a>"
local anInputStream = LUXMLInputStream( aContent )

for  aType, aText, aName, someAttributes in anInputStream:iterator() do
        if aType == LUXMLInputStream.Text then
                print( aType, aText )
        elseif someAttributes ~= nil then
                print( aType, aName, someAttributes )
        else
                print( aType, aName )
        end
end

> 1       a       { href = http://www.example.com }
> 3       example"> example
> 2       a

Or if you simply want the textual part:

for aText in anInputStream:iterator( true ) do
        print( aText )
end

>	example"> example

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/