Re: [ANN] Lua in Moscow meetup, this Wednesday #lua #lualang #luainmoscow

Hi, Geoff, all,

Basically, the discussion that day strayed more to the big-data analytics in general than to Lua specifically (which is, I think, more of a good thing).

As for Lua, we discussed that Lua (or rather LuaJIT) is a good instrument for ad-hoc data pre-processing.

At LogicEditor we (for many reasons) don't use Hadoop for the big-data analysis (we have about 1TB/day of uncompressed data to analyze).

For quick data pre-processing and analysis we use simple combination of standard Linux tools (parallel, grep, sort, uniq, cut and some awk). A typical command looks something like this:

time pv uid-time-ref-post.gz\

| pigz -cdp 4 \

| cut -d$'\t' -f 1,3 \

| parallel --gnu --progress -P 10 --pipe --block=16M \

$(cat <<"EOF"

luajit ~/url-to-normalized-domain.lua

EOF

) \

| LC_ALL=C sort -u -t$'\t' -k2 --parallel 6 -S20% \

| luajit ~/simple-reduce-key-counter.lua \

| LC_ALL=C sort -t$'\t' -nrk2 --parallel 6 -S20% \

| pigz -cp4 domain-uniqs_count-www-merged.gz

Where url-to-normalized-domain.lua and simple-reduce-key-counter.lua are trivial Lua scripts, 50-100 LOC.

...As for the production data processing — we use our own map-reduce framework in LuaJIT FFI, but that's another story.

* * *

If anyone is interested in details, I'll be happy to tell more :)

Best,

Alexander.

On Tue, Sep 30, 2014 at 12:09 AM, Geoff Leyland <geoff_leyland@fastmail.fm> wrote:

Hi Alexander,

On 29/09/2014, at 10:08 pm, Alexander Gladysh <agladysh@gmail.com> wrote:

> Hi, all,
>
> If, per chance, you'll be in Moscow this Wednesday, you're welcome to join us at the next Lua in Moscow meetup. We'll be discussing big data analytics with Lua:
>
> http://www.meetup.com/Lua-in-Moscow/events/207552272/

I’m a wee way from Moscow, but if you’re publishing any slides or anything from the meetup, I’d be very interested.

Cheers,
Geoff

--
Dr Geoff Leyland - geoff.leyland@incremental.co.nz
Incremental Limited
+64 21 717 432
http://www.incremental.co.nz/