More results in the meantime:
I have created a version of iterations, where I use
rawset(t, i, rawget(t, i) + 1)
t[i] = t[i] + 1to skip any __index() and __newindex() calls completely.
And according to my tests rawget/rawset slow down array operations significantely compared to a simple Lua's t[i] access without metatables. Basically, rawset and rawget a C functions exposed via a standrad Lua C library. Therefore each invocation of these functions introduces the almost the same overhead as in case of my own C-based vector implementation (and obtained timings confirm it). Using simple t[i] = t[i] + 1 seems to be way more efficient, because it happens completely on Lua side and does not require and Lua->C function invocations.