Re: another try at multithreading

Commenting the plua.txt portion (below).

The style of programming you seem to be after is similar to what the TBB (Intel Threading Building Blocks) C++ library offers. But it offers more. It is built on the concept of splitting problems into smaller chunks, until enough parallelism has been reached to fully load all the cores of the CPU. I like it.

I will be for sure reading it, I didn't know about it but the idea is pretty much the same, with the exception that I don't necessarily load cores, I can be loading multiple processors (SMP) or multiple nodes (non-shared memory).

parallel do
-- block, every chunk is dispatched in parallel, synchronization at the end.
end

What exactly should this do? If all the chunks get same values, they'd just do the same thing.

Maybe there is a misunderstanding here of "chunk" and "block". Actually for Lua blocks are chunks anyway, but in this examples I mean the chunks inside the block. There is space to have different data sets using the variables from the upper scope or even upvalues in case of closures.

Pretty dumb example, thinking in a dual-core processor:

a, b, c, d = 1, 2, 3, 4

parallel do
   a = a + 1 -- This will run in the first core.
   b = b + 1 -- This will run in the next available core and could start before the first chunk finished, if the first chunk is too slow.
   c = c + 1 -- This will run in the next available core and the same.
   d = d + 1 -- This will run in the next available core and the same.
end

In this example probably won't matter, the execution is too fast to span between multiple cores, but think in a more complex function messing with a, b, c and d. What the keyword parallel makes with the chunks inside the do block is to make them non-blocking. The execution is still in order but the VM doesn't wait one chuck to finish to start executing another if there is a second core available.

This messes with a lot of synchronization issues inside the Lua state, there is no way that it could be done without making the stack fully multi-threaded.

I see that your lanes library deals with multiple states and uses closures, so you don't have to mess with internal Lua state (I believe) and don't change the syntax. It's a different approach. I made this annotations more than two years ago, I just got them from my files and posted it as an example that could mess with Lua internally but still can share a lot of code with the original tree.

It could be done without the new keywords:

p = parallel.do(function () a = a +1, b = b +1, c = c + 1, d = d +1 end)
p()

But that will still mess with Lua internals because of the synchonization issues of the stack. I remember Luiz telling me to try to make a library first, but that would be impossible in that case, even if I don't mess with the syntax adding new keywords and using your anonymous functions approach.

I don't know if that kind of parallelism is useful, seems to be if Intel is playing with that, it's more a research project than a real application project. I'm still thinking about but what concerns me is to keep close to the original Lua code base, and for that some kind of cooperation between the forks must exist, and a source control would help.

Augusto Radtke