lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> -----Original Message-----
> From: lua-l-bounces@lists.lua.org [mailto:lua-l-bounces@lists.lua.org] On
> Behalf Of Gilles
> Sent: vrijdag 15 mei 2015 15:06
> To: lua-l@lists.lua.org
> Subject: Recommended way to download and parse web pages?
> 
> Hello
> 
> I'm a semi-Lua newbie.
> 
> I need to fetch web pages and extract infos from each of them.
> 
> I have LuaRocks installed, and was wondering what packages are
> recommended for this.
> 
> lua-curl
> luacurl
> 
> http-digest
> httpclient
> lua-http-parser
> lua-resty-http
> 
> htmlparser
> luahtml
> lusty-html
> 
> Thank you.
> 

I think you would need a 'fetching' and a 'parsing' element. For fetching you could use Copas [1], which has recently gained async client support for http(s) (luasec required for the 's' part). See this example [2] for fetching multiple pages simultaneously/async.

For parsing; depends on the complexity. If it's simple, use lua patterns. Otherwise the proposed lua-gumbo seems a good fit (just read the readme, have no experience with it).

Thijs

[1] https://github.com/keplerproject/copas 
[2] https://github.com/keplerproject/copas/blob/master/tests/testlimit.lua