Well, yes I think there is some confusion here. We're talking about a specific structure, Lua_State, that is allocated once for each main (Lua) thread and coroutine. Each one is also pretty small, as it mostly holds root pointers to other structures. This isn't Lua state in the abstract sense of "all the state of a Lua VM". I doubt adding a single void* would impact a typical Lua install by more than a few dozen bytes TOTAL.
Sorry, but you're making assumptions on how malloc(), object sizes and minimal allocations relate. Take the pagesize and substract malloc overhead. That's the relevant number in i386's VM model. I'd want to know about contemporary systems that allow greater granularity.
You're probably confused with arena or bucket sizes, which are irrelevant intermediaries when determining the real cost of growing Lua_State. Not all allocators use size arbitrated buckets. For example, an allocator for short running routines whose access to the regular system allocator is difficult, like the dynamic linker's, have no need for size organization.
>> 2. The only relevant size constraint isn't heap bound. If you are eg. copying states, the possibly unused user pointer is going to touch a register.
> Well good luck copying a Lua_State (can you spell crash?), and why would placing an extra field in a structure touch an extra register?
As for malloc(), nope, not confused at all, and while I am making a general assumption about malloc(), I've yet to find a help allocator underneath the C malloc() API that only allocates multiples of VM pages (I'm not sure when you mean "greater" granularity of you mean finer or coarser grained).