[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] Lua in Moscow meetup, this Wednesday #lua #lualang #luainmoscow
- From: Hisham <h@...>
- Date: Fri, 3 Oct 2014 15:24:30 -0300
On 3 October 2014 15:13, Alexander Gladysh <agladysh@gmail.com> wrote:
> Hi, Geoff, all,
>
> Basically, the discussion that day strayed more to the big-data analytics in
> general than to Lua specifically (which is, I think, more of a good thing).
>
> As for Lua, we discussed that Lua (or rather LuaJIT) is a good instrument
> for ad-hoc data pre-processing.
>
> At LogicEditor we (for many reasons) don't use Hadoop for the big-data
> analysis (we have about 1TB/day of uncompressed data to analyze).
>
> For quick data pre-processing and analysis we use simple combination of
> standard Linux tools (parallel, grep, sort, uniq, cut and some awk). A
> typical command looks something like this:
>
> time pv uid-time-ref-post.gz\
> | pigz -cdp 4 \
> | cut -d$'\t' -f 1,3 \
> | parallel --gnu --progress -P 10 --pipe --block=16M \
> $(cat <<"EOF"
> luajit ~/url-to-normalized-domain.lua
> EOF
> ) \
> | LC_ALL=C sort -u -t$'\t' -k2 --parallel 6 -S20% \
> | luajit ~/simple-reduce-key-counter.lua \
> | LC_ALL=C sort -t$'\t' -nrk2 --parallel 6 -S20% \
> | pigz -cp4 domain-uniqs_count-www-merged.gz
Just curious, but: how many cores do you have in the machine that runs this?
-- Hisham