[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: reusing objects for tight iteration loops
- From: Josh Haberman <jhaberman@...>
- Date: Thu, 30 Jan 2014 16:54:25 -0800
Let's suppose I'm iterating over a large data set, and for every "row"
of the set I want to execute some Lua code.
I could implement an iterator, then my users could use for/in:
for row in fast_iterator(data_stream) do
do_something(row.foo, row.bar)
end
If you optimize this pattern enough, you start getting to the point
where the malloc/GC/cache effects of allocating a new "row" every time
start to become significant.
If this were C, the next step in optimization would be to re-use the
"row" struct every time, something like:
row_t row;
while (get_next_item(&row, iterator)) {
do_something(&row);
}
This avoids the per-row allocation cost. But this doesn't translate to
Lua very well:
local items = {}
-- Suppose fast_iterator() is optimized to return the same
-- row over and over, but with the fields overwritten with new values.
for row in fast_iterator(data_stream) do
-- Oops! My table "items" ends up with n of the same value,
-- with all fields set to the values of the last row.
items[#items + 1] = row
end
We can try to work around this, but things are still dicey if the row
has sub-objects:
local row = FastIteratorObject(iterator)
while row:next_object()
-- Now it's clear that "next_object()" mutates the row to be the
-- contents of the next row.
-- But imagine that row:next_object() can change row.bar.baz:
do_something(row.bar)
end
So I was just wondering if anyone had any out-of-the-box ideas about
mitigating this. One idea I had was to make the inner function a
string:
fast_iterate(iterator, "function (row) print(row.foo) end")
This allows me to precisely control the function's environment, so I
can prevent references to "row" from escaping through the global
environment or upvalues. The downside is that then none of the
program's functions or variables will be visible, which limits the
usefulness of this approach pretty severely.
Another approach would be to see whether "row" has any references at
the end of the function; if the object didn't actually escape the
inner function then I could reuse it without any surprise to the user.
But Lua has no API for doing anything like this.
Not sure there's a good answer here, I'm just thinking out loud and
would love to hear anyone's brilliant idea.
Thanks,
Josh