lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 3 October 2014 15:13, Alexander Gladysh <agladysh@gmail.com> wrote:
> Hi, Geoff, all,
>
> Basically, the discussion that day strayed more to the big-data analytics in
> general than to Lua specifically (which is, I think, more of a good thing).
>
> As for Lua, we discussed that Lua (or rather LuaJIT) is a good instrument
> for ad-hoc data pre-processing.
>
> At LogicEditor we (for many reasons) don't use Hadoop for the big-data
> analysis (we have about 1TB/day of uncompressed data to analyze).
>
> For quick data pre-processing and analysis we use simple combination of
> standard Linux tools (parallel, grep, sort, uniq, cut and some awk). A
> typical command looks something like this:
>
> time pv uid-time-ref-post.gz\
> | pigz -cdp 4 \
> | cut -d$'\t' -f 1,3 \
> | parallel --gnu --progress -P 10 --pipe --block=16M \
>   $(cat <<"EOF"
>     luajit ~/url-to-normalized-domain.lua
> EOF
>   ) \
> | LC_ALL=C sort -u -t$'\t' -k2 --parallel 6 -S20% \
> | luajit ~/simple-reduce-key-counter.lua \
> | LC_ALL=C sort -t$'\t' -nrk2 --parallel 6 -S20% \
> | pigz -cp4 domain-uniqs_count-www-merged.gz

Just curious, but: how many cores do you have in the machine that runs this?

-- Hisham