[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Good solution to parse HTML?
- From: Eduardo Ochs <eduardoochs@...>
- Date: Sat, 21 Nov 2015 16:55:40 +0000
On Wed, Nov 18, 2015 at 4:50 PM, Eduardo Ochs <eduardoochs@gmail.com> wrote:
> On Wed, Nov 18, 2015 at 3:06 PM, Aapo Talvensaari
> <aapo.talvensaari@gmail.com> wrote:
>> On 18 November 2015 at 15:55, Nereus <codecomplete@free.fr> wrote:
>>> (...)
>>> Is there a good tool I could use in Lua to parse HTML?
>>
>> I would recommend using HTML parser, such as this:
>> https://github.com/craigbarnes/lua-gumbo
>
> By the way, anyone here knows how to _use_ lua-gumbo?
> I just tried again my scripts for downloading, compiling and
> installing gumbo-parser and lua-gumbo, which are:
>
> rm -Rfv ~/usrc/gumbo-parser/
> cd ~/usrc/
> git clone --depth 1 https://github.com/google/gumbo-parser
>
> cd ~/usrc/gumbo-parser/
> sh ./autogen.sh 2>&1 | tee oa
> ./configure 2>&1 | tee oc
> make 2>&1 | tee om
> sudo make install 2>&1 | tee omi
>
> rm -Rfv ~/usrc/lua-gumbo/
> cd ~/usrc/
> git clone --depth 1 https://github.com/craigbarnes/lua-gumbo
> cd ~/usrc/lua-gumbo/
> make 2>&1 | tee om
> make check 2>&1 | tee omc
> sudo make install 2>&1 | tee omi
>
> and now the "make check" in lua-gumbo passes only 2 of the tests, and
> fails the other 19 ones... anyway, I've never been able to use
> lua-gumbo for even the simplest things, like extracting the title of
> an HTML page...
Update (with thanks to Craig Barnes!):
all that was missing was an "ldconfig" to make the new library in
/usr/local/bin/ be recognized... this works:
(eepitch-shell)
(eepitch-kill)
(eepitch-shell)
rm -Rfv ~/usrc/gumbo-parser/
cd ~/usrc/
git clone --depth 1 https://github.com/google/gumbo-parser
cd ~/usrc/gumbo-parser/
sh ./autogen.sh
./configure
make
sudo make install
sudo ldconfig
rm -Rfv ~/usrc/lua-gumbo/
cd ~/usrc/
git clone --depth 1 https://github.com/craigbarnes/lua-gumbo
cd ~/usrc/lua-gumbo/
make
make check
sudo make install
lua5.1
parse = require("gumbo").parse
print(parse("<title>Hello world!</title>").title)
Cheers! =)
Eduardo Ochs
eduardoochs@gmail.com
http://angg.twu.net/