[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: RE: Recommended way to download and parse web pages?
- From: Thijs Schreijer <thijs@...>
- Date: Fri, 15 May 2015 21:56:51 +0000
> -----Original Message-----
> From: lua-l-bounces@lists.lua.org [mailto:lua-l-bounces@lists.lua.org] On
> Behalf Of Gilles
> Sent: vrijdag 15 mei 2015 15:06
> To: lua-l@lists.lua.org
> Subject: Recommended way to download and parse web pages?
>
> Hello
>
> I'm a semi-Lua newbie.
>
> I need to fetch web pages and extract infos from each of them.
>
> I have LuaRocks installed, and was wondering what packages are
> recommended for this.
>
> lua-curl
> luacurl
>
> http-digest
> httpclient
> lua-http-parser
> lua-resty-http
>
> htmlparser
> luahtml
> lusty-html
>
> Thank you.
>
I think you would need a 'fetching' and a 'parsing' element. For fetching you could use Copas [1], which has recently gained async client support for http(s) (luasec required for the 's' part). See this example [2] for fetching multiple pages simultaneously/async.
For parsing; depends on the complexity. If it's simple, use lua patterns. Otherwise the proposed lua-gumbo seems a good fit (just read the readme, have no experience with it).
Thijs
[1] https://github.com/keplerproject/copas
[2] https://github.com/keplerproject/copas/blob/master/tests/testlimit.lua