Speaking of worrying about optimizing storage allocations to gain 
improved
speed, how complicated would it be for lua_cpcall to not create a 
function
closure? For example, could one introduce a new primitive type for C
functions without up values that would be much like light userdata?
Or one could use a trampoline that got allocated once and registered
somewhere and which would use the userdata to point to a block 
containing
the C function to call and the parameter to pass to it. For example:
    int cpcaller( lua_State *L ) {
        struct CCallS *c = cast(struct CCallS *, ud);
        lua_pop( L, 1 );
        lua_pushlightuserdata( L, c->ud );
        return (*c->func)( L );
    }
    static void f_Ccall (lua_State *L, void *ud) {
        lua_pushlightuserdata( L, &cpcaller );
        lua_rawget( L, LUA_REGISTRYINDEX );
        if( lua_isnil( L, -1 ) ) {
            lua_pop( L, 1 );
            lua_pushcfunction( L, &cpcaller );
            lua_pushlightuserdata( L, &cpcaller );
            lua_pushvalue( L, -2 );
            lua_rawset( L, LUA_REGISTRYINDEX );
        }
        setpvalue(L->top, ud);  /* push only argument */
        incr_top(L);
        luaD_call(L, L->top - 2, 0);
    }
One could even improve the efficiency a bit by avoiding pushing the 
void* as
a light userdata for the eventual function if the function were 
defined to
take a void* parameter. For example:
    int cpcaller( lua_State *L ) {
        struct CCallS *c = cast(struct CCallS *, ud);
        lua_pop( L, 1 );
        return (*c->func)( L, c->ud );
    }
Pushing this further, we might get:
    int lua_cvpcall(
        lua_State *L, int (*func)( lua_State *L, va_list *args ), ... 
);
That would allow one to write:
    lua_cvpcall( L, &myFunc, arg1, arg2, arg3 );
This would call myFunc with the state L and the supplied arguments 
packaged
up as a va_list.
Pushing in a different direction, it might be attractive to support a
non-protected version as well.
I could also see it being attractive to allow cpcall to return results 
if it
doesn't fail. The stack can then be restored with lua_settop.
Mark