Hi, Geoff, all,
Basically, the discussion that day strayed more to the big-data analytics in general than to Lua specifically (which is, I think, more of a good thing).
As for Lua, we discussed that Lua (or rather LuaJIT) is a good instrument for ad-hoc data pre-processing.
At LogicEditor we (for many reasons) don't use Hadoop for the big-data analysis (we have about 1TB/day of uncompressed data to analyze).
For quick data pre-processing and analysis we use simple combination of standard Linux tools (parallel, grep, sort, uniq, cut and some awk). A typical command looks something like this:
time pv uid-time-ref-post.gz\
| pigz -cdp 4 \
| cut -d$'\t' -f 1,3 \
| parallel --gnu --progress -P 10 --pipe --block=16M \
$(cat <<"EOF"
luajit ~/url-to-normalized-domain.lua
EOF
) \
| LC_ALL=C sort -u -t$'\t' -k2 --parallel 6 -S20% \
| luajit ~/simple-reduce-key-counter.lua \
| LC_ALL=C sort -t$'\t' -nrk2 --parallel 6 -S20% \
| pigz -cp4 domain-uniqs_count-www-merged.gz
Where url-to-normalized-domain.lua and simple-reduce-key-counter.lua are trivial Lua scripts, 50-100 LOC.
...As for the production data processing — we use our own map-reduce framework in LuaJIT FFI, but that's another story.
* * *
If anyone is interested in details, I'll be happy to tell more :)
Best,
Alexander.