Re: Multi-process C calling Lua: sharing data between Lua scripts?

I can share some experience with communicating Lua states:

"shared"

In a program where I have several Lua states within one process, I have a "table" called "shared" - in fact it is "userdata" with __index and __newindex metamethods, but for the Lua programmer it behaves similar to a table. You can store strings, numbers and booleans there. You cannot store threads/coroutines, functions or userdata - they cannot be transferred to another state anyway. Currently it cannot store tables, but it would be possible to extend it to recursively store non-cyclic tables containing only strings, numbers and booleans (maybe I will do this some time). Currently users can serialize such tables themself (e.g. to a JSON string or whatever). All read/write operations are protected by a mutex, and the data is stored in another Lua state that basically only acts as a hashmap here. The code is in production use in a webserver using multiple Lua states in the same process, so it might not fit to your application without adaptations. However, it is MIT licensed, so you can take from there do whatever modification you like: https://github.com/civetweb/civetweb/blob/master/src/mod_lua_shared.inl

"lsqlite"

Another option to share data is through a database used by several processes. This can be done from Lua using "lsqlite3" ... or any other database binding.

Again you can share string, numbers and booleans. You can use transaction based read/write operations on complete records (tables with proper elements).

It will work with different processes, and also offer persistency: when all your Lua processes are shut down, the data is still in the database.

Depending on the database binding, you may even exchange data between processes running on a different host.

>From my experience, it depends on the background of the users whether or not they will like this. A database is a "strange" element if you are programming pure Lua - you can wrap these operations somehow, but they do not "feel" that natural for a Lua programmer than a table (as in "shared" above).

"lsh"

I also created a module to access shared memories in Lua (Linux only, but it would be easy to add Windows support). A shared memory can be used to exchange data between Lua states in different processes on the same machine - when sharing between processes owned by different users on the same machine, you need to take care to set the user access rights correctly - this might be bothersome but doable. I used shared memory to exchange data between Lua processes and C/C++ processes - between a language with dynamic typing (Lua) and static typing (C). For a shared memory, you need to define a static memory layout - you need to work with address offsets in this shared memory. You also need to take into consideration that C does not have a "string" in the same sense as Lua does - instead it uses a character array with a fixed size, and the string can never grow larger than that.

If you only share between Lua and Lua, you would still need to know where (what memory offset) to put what element - you still have to use a fixed, static memory layout.

You cannot store "whatever you like", but only what has been provided in the shared memory layout - it is not like a "table" where you can add new elements as you like. You cannot do any duck typing with this solution. Using low level shared memory addressing functions directly requires some additional training for a Lua programmer.

I did not use it for "Lua to Lua", but only for "Lua to C", with a static memory layout predefined as a C structure.

files

So unspectacular, I almost forgot about it: Of course you can use files to share data between Lua states.

Not really a "high performance" solution, but works out of the box without any additional C library.

Combining stuff:

From my experience, an important criterion is the type system you need to support.

"shared" behaves like a Lua table.

"lsqlite" behaves like a database - you define a table structure and add rows.

"lsh" (and probably any other shared memory solution) behaves like a fixed C data structure.

If you are fine with more or less fixed data structures, you can go with a database or a shared memory.

Variable data structure works better with an approach similar to "shared" - it's currently limited to Lua states in the same process, but that could be adapted to work with multiple processes by combining it with some interprocess communication mechanism. All the "read" and "write" operations in "shared" could be sent through domain sockets (Linux) or a named pipe (Windows) - or any other IPC mechanism, to a process holding all data (the "shared" state). This will keep the "look and feel" of a Lua table without any need to predefine any table structure.

On Sat, Feb 27, 2021 at 3:48 PM Philippe Verdy <verdyp@gmail.com> wrote:

There are also memory-mapped files. The access control and synchronization across processes being offered by the hosting file-system, each process of thread can get a consistent view. But you have to use filesystem's locking mechanisms for atomic operations.

Mmap'ed memory is very fast (much more than conventional file I/O as they are implicitly buffered for at least the size of your mapped file segment).
Caveats:
* if you have to work on very large datafiles, moving the window mapped in memory at another location of th file would destroy your buffer and would consume lot of I/O (or the default shared cache of the filesystem) just to refill it, and would require adding large VM space to the process. If you have many threads doing this in the same process, the process memory may explode.
* exclusive file locking (with the filesytem calls/API) does not work across threads, unless the OS provides isolation level with calls/API at thread level (and Lua states are not necessarily mapped to a native thread); inside the same Lua app, you will need other locking mechanisms from the Lua machine itself (across its "light" threads). Using data serialization is still the way to go to avoid dead interlocking situations for atomic operations using locks in random order
* the last alternative is to use an external database (or a mcached store for its speed). You just need a connector library to connect to the "remote" database or store.

And be aware of possible breaches of privacy or security on caches (i.e. implement a cache eviction policy, using segregated pools, instead of using simple LRU-based eviction; this is true for all sorts of caches, including DNS client caches, web caches in browsers, or in routers; not also that the file system caches are NOT secure by default as they rarely provide a cache evition policy with segregated pools you'd want; not doing this expose your online services to data leaks, without knowing secrets in advance).

Unfortunately all modern computing devices, OSes, drivers and application softwares, and most websites you visit are using many levels of caches which are not secured at all (but not in control by the clients using them), most of them using basic LRU eviction policies (there may exist some segration in multiple pools, but no way to segregate them in application-controled domains, as the subdivision is most often arbitrary, only optimistic, and only tuned for best average global performance, and not at all tuned for security. Those breaches are massively harnessed by advertizers (to abuse our privacy), and bad hackers to steal secrets and then money, or to gain access to sites even when they are secured by the best firewalls, the best encryption/authentication/quota mechanisms or other isolation mechanisms (threads, processes, process groups, containers, virtual machines...) of the OS (possibly implemented by hardware in CPU/GPU/bus controlers, SSD/HDD, all of them having some caching mechanisms with too basic eviction policies as they are clearly optimized optimiscly, only for speed and global average performance).

For now the best solution is to use multifactor authentication, but it's not enough as attacks also exist across authorized users of the system which are insufficiently sandboxed).

Caches are the worst nightmare in all modern architectures, we highly depend on them for modern performances. So it's very hard to isolate them all, and to define and implement the correct eviction policies. without sacrifying a lot of performance or adding lot of "idle" redundancy to the system to secure. Even if you do that, you'll pay the huge price of energy, and power saving strategies will ruin all your efforts because you'll reintroduce variable latency for conditional on-demand wake ups, which are also a form of cache (except that there's little of no segregation at all, the system is sleeping or awake and offers no application-controled separation of domains) ! And even today, we continue to train people with basic LRU mechanism or never teach them to make them constantly aware about the risks of ALL caches.

Le sam. 27 févr. 2021 à 14:22, Viacheslav Usov <via.usov@gmail.com> a écrit :
On Sat, Feb 27, 2021 at 5:29 AM caco <cacophonitrix@protonmail.com> wrote:
>
> From C I have master process in parallel with number of agent processes each binding a luaL_State. I want the agents Lua scripts to communicate through Lua supervised (i.e. gc'd) data. How can I do this?

Since you said "process", not "thread", it should be said that in a
modern OS such as (a recent version of) Linux, MacOS and Windows,
distinct processes have isolated memory spaces and cannot touch memory
in another process, with one exception.

Without the exception, your only option is to serialize and
deserialize data as byte chunks and use some IPC to transport the
chunks between processes.

The exception is shared memory. With it, you get all the complications
of multiple threads and then some more. Depending on how badly you
want your thing, this might be something to consider.

Cheers,
V.