[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Web crawling in Lua
- From: David Hollander <dhllndr@...>
- Date: Sun, 7 Aug 2011 06:43:33 -0500
> I use them both in my little web-crawling utility module WDM 
I see you are using Roberto's XML parser as a base, which is a strict
parser that raises errors on improperly formatted XML?
A problem I ran into last week is that the HTML spec is a bit
different than XML, unless the webpage is specifically using an
XHTML doctype, and many websites had html errors on top of that. The
approach I went with was a non-strict HTML parser that always tries to
stick elements somewhere in the DOM, I'll put what I have so far on
the wiki or github later this week.
On Sun, Jul 31, 2011 at 11:51 AM, Michal Kottman <firstname.lastname@example.org> wrote:
> On Sun, 2011-07-31 at 14:41 +0200, Dirk Laurie wrote:
>> The libcurl library documentation lists two sets of Lua bindings to curl:
>> luacurl by Alexander Marinov
>> Lua-cURL by Jürgen Hötzel
>> Comments welcome by someone who has experience of either.
> Both have a similar interface. I use them both in my little web-crawling
> utility module WDM , so you may take a look there.
> The differences essentially are:
> - binds only the easy interface
> - initialize with curl.new()
> - passes (userparam, string) to WRITEUNCTION
> - binds also multi/shared API
> - initialize with curl.easy_init()
> - passes only string to WRITEFUNCTION
>  https://github.com/mkottman/wdm/blob/master/wdm.lua