[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Web crawling in Lua
- From: David Hollander <dhllndr@...>
- Date: Sun, 7 Aug 2011 06:43:33 -0500
> I use them both in my little web-crawling utility module WDM [1]
I see you are using Roberto's XML parser as a base, which is a strict
parser that raises errors on improperly formatted XML?
A problem I ran into last week is that the HTML spec is a bit
different than XML[1], unless the webpage is specifically using an
XHTML doctype, and many websites had html errors on top of that. The
approach I went with was a non-strict HTML parser that always tries to
stick elements somewhere in the DOM, I'll put what I have so far on
the wiki or github later this week.
[1] http://en.wikipedia.org/wiki/HTML_element#Syntax
On Sun, Jul 31, 2011 at 11:51 AM, Michal Kottman <k0mpjut0r@gmail.com> wrote:
> On Sun, 2011-07-31 at 14:41 +0200, Dirk Laurie wrote:
>> The libcurl library documentation lists two sets of Lua bindings to curl:
>>
>> Lua
>>
>> luacurl by Alexander Marinov
>> http://luacurl.luaforge.net/
>>
>> Lua-cURL by Jürgen Hötzel
>> http://luaforge.net/projects/lua-curl/
>>
>> Comments welcome by someone who has experience of either.
>
> Both have a similar interface. I use them both in my little web-crawling
> utility module WDM [1], so you may take a look there.
>
> The differences essentially are:
>
> LuaCurl:
> - binds only the easy interface
> - initialize with curl.new()
> - passes (userparam, string) to WRITEUNCTION
>
> Lua-cURL:
> - binds also multi/shared API
> - initialize with curl.easy_init()
> - passes only string to WRITEFUNCTION
>
>
> [1] https://github.com/mkottman/wdm/blob/master/wdm.lua
>
>
>
>