lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Aug 20, 2015 at 10:27:18AM -0700, Tim Hill wrote:
> 
> > On Aug 19, 2015, at 11:49 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> > 
> > I think I understand it now, thanks to the messages of the past
> > 16 hours. (In my timezone, people are just waking up.) The
> > OP's use case has been clouding the issue. What is being
> > proposed is much more versatile.
> 
> ….
> 
> Good summary :) .. But I’m not sure I’m on board with the idea...
> 
> Since userdata is opaque to Lua, any operation on that userdata must involve a call to a C function with the userdata as an argument. Once you do this, all the issues discussed here go away; it’s easy to provide the extra metadata about a userdata using any number of indirection/tagging techniques at the C level, including lifetime management if you want to (for example) refcount etc.
> 
> However, any such technique is of course private to each developer. In addition, they will typically involve a more complex low-level C model (struct with pointers to structs etc). So the question is: Would there be a benefit in having an “official” way for Lua to perform this in a generic manner that everyone could leverage?
> 
> My opinion is NO, for the following reasons:
> 
> — Lua already has metatables and uservalues that can be attached to
> userdata. Why isn’t a conventional use of these facilities adequate?

The issue is performance. Using just metatable or uservalues, you have to
instantiate a _new_ userdata object for every operation. For example, with

	obj.x.y.z

you have to allocate 3 new userdata values (one for x, one for y, then one
for z). At least two of which will immediately become garbage, and probably
the third. _Caching_ these values isn't feasible in this instance because
eventually you'll have instantiated a unique userdata for almost every node
in the data structure, which is too much memory in the OPs scenario, not to
mention generates alot of garbage for the collector.

Using sub-typing (?) you only ever instantiate a single userdata, or at
least a much smaller number of userdata. The type information is stored on
the Lua stack slot and in table nodes. When you change a subtype you're just
changing the type bits stored in that particular stack slot. It will not
effect any other reference to that userdata, but will be inherited when
copying the values to and from the stack.

As somebody else pointed out, think of C pointers.

	struct foo *a = new_foo();
	const struct foo *b = a;

Doing this in Lua so that b remains qualified as constant requires creating
a whole new userdata value and setting some member field. Whereas in C the
constness is derived from the type of variable. The subtyping idea for Lua
is about being able to annotate not the userdata object itself, but the
variable (stack slot, table node) that holds the userdata object. By doing
that it reduces the runtime cost to almost nothing--zero cost other than the
conditionals in your C code. (Unless we need to grow some internal data
structures.)

> — I think it would be difficult to come up with a design that was both
> sufficiently flexible and compact.

While I like the _idea_ of subtyping, it is definitely problematic. The OPs
original suggestion to use userdata handles is a non-starter. My suggestion
to use the free bits in the type field is more practical. However, it might
be insufficient for the OPs particular use. If we have 20 free bits that
only permits about 1M useable subtypes to annotate any particular userdata.
That skirts dangerously close to the maximum possible number of nodes in the
OPs actual case, and is definitely insufficient for larger documents. When
the subtype overflows, you'd have to resort to cloning the metatable to a
new metaname (e.g. JSON1*, JSON2*, etc). And that's just super ugly.

A cleaner solution which still permits the subtyping to be abused for the
OPs scenario would be to support as many subtypes as there are possible
addressable objects, either in C or Lua, which really means as many bits as
in pointers (32-bits, 48-bits), maybe 1 or 2 bits shy.

That's not perfect because there are common situations where the subtype (or
view or window onto an object) needs to be expressed as a tuple. For
example, array slices. Or further afield, permutations. Subtyping comes up
short in this respect, especially relative to the simplicty of userdata. But
I'm coming around to the belief that it's a pretty nice feature on balance
(useability, simplicty, and especially performance) as long as the subtype
range is large enough.

> — As you have noted on many occasions,
> Roberto et al are very unlikely to consider such an addition unless there
> is a very strong case for it.

Plus 5.3 is less than a year old. Unless the OP doesn't mind waiting several
more years for 5.4 or 6.0, much of the energy in this discussion is being
wasted.