lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sat, 2011-04-09 at 00:35 +0400, Alexander Gladysh wrote:
> Hi, list!
> 
> I'm looking for a Lua module to scrape some data from a (possibly
> broken) HTML page.
> 
> Any usable ones out there?

Also, I started toying around with the QtWebKit module, which is
available through lqt [1]. In previous versions of Qt (4.5 and earlier)
it couldn't actually do much more than load and display a page.

Starting with 4.6, the QtWebKit API [2] offers full DOM access to
elements of a web page you are loading. The API supports CSS 2
selectors, so you can get easily to the elements you are interested in.
Right now I am playing with modifying a page that is loaded from the
web, something like GreaseMonkey, just implemented in Lua :)

In case you only need the "robust HTML source" (as robust as WebKit is),
you can use the following code:

require 'qtcore'
require 'qtgui'
require 'qtwebkit'

local A = QApplication(select('#',...)+1, {...})
local W = QWebView()

-- will be called when the page load is finished
W:__addmethod('process(bool)', function(self, ok)
  -- get the source as QString
  local sourceString = self:page():mainFrame():toHtml()
  -- convert it to Lua string
  local source = sourceString:toUtf8() 
  
  print(source) -- or process as needed...

  A.quit()      -- quit the event loop
end)
W:connect('2loadFinished(bool)', W, '1process(bool)')

W:setUrl(QUrl('http://www.lua.org/'))
A.exec()



[1] https://github.com/mkottman/lqt
[2] http://doc.qt.nokia.com/4.6/qtwebkit.html