lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Jul 6, 2015 at 9:36 AM, Cosmin Apreutesei
<cosmin.apreutesei@gmail.com> wrote:
>> As for weak values, your implementation there in luapower doesn't have
>> a __gc metamethod on tuples, which makes the use of weak values not
>> even worth the overhead it does add -- you aren't saving that much
>> memory since the index tables are be populated.
>
>
> I'm not sure I understand this point. The use of weak tables is not an
> optimization, it is necessary to have a non-leaky implementation. What does
> __gc has to do with it?

The point is that you don't currently have a non-leaky implementation.
It still leaks. It just leaks slower than if you weren't using weak
keys. Meanwhile, it's not exactly "leaky" if you think of it as an
interning pool or like memoization, where it's EXPECTED that the
memory usage is monotonically increasing as new unique values appear
-- allowing the memory usage to decrease is an optimization, from that
perspective.

>> In other words, there's a lot of different ways to do it, each with
>> advantages and disadvantages, and implementing any of these in the Lua
>> core would have similar tradeoffs and I'm fairly certain that you're
>> not going to satisfy everyone by choosing ONE of these to canonize in
>> the core.
>
> All engineering is tradeoffs and you'll never going to satisfy anyone,
> that's an impossible standard which I hope Lua is not following or we'll
> never see anything new in Lua again.

The standard isn't "never make tradeoffs." The standard is "make sure
the tradeoffs are worth it" with a side of "avoid breaking/penalizing
existing code if you can". And one tradeoff that's ALMOST NEVER worth
it is making an extremely common operation more expensive. It's ALMOST
ALWAYS preferable to add a little bit more per-call cost to a
less-frequent scenario.

(Whether either of these is more or less preferable to using up more
memory isn't so easily generalized.)

>> In particular, the most obvious implementation in the core
>> is the non-interned hashed version, and I think people would complain
>> vehemently about making table reads more expensive.
>
> Why would they get more expensive? It's the same hashmap algorithm except
> now the hash function is a sum of the hashes of the elements and I don't
> think anyone would object that that should be O(#t) as it is for strings.
> What am I missing?

It's not O(#t) for short strings (that is, the kind you're most likely
to use as table keys), it's O(1) thanks to string interning. Likewise,
currently using a table as a key is O(1), but hashing on every lookup
would be O(#t) as you describe -- but if you interned tuples like you
do strings, it would still be O(1). You could try to find a balance by
memoizing the hash or computing it on construction, but that's still
not ideal for ephemeral objects like the one produced in an expression
like "tbl[tuple(x,y,z)]".

(Memoizing it lazily is preferable because then if you never need the
hash at all you don't have to compute it.)

>> These are, of course,
>> exactly the same tradeoffs you have to make doing it outside of the
>> core.
>
> I disagree. A built-in tuple would not create garbage (other than the tuple
> itself)

A built-in tuple would either have more garbage from having multiple
deep copies of non-unique tuples, or more long-term memory usage from
having a canonical copy of the tuple interned in a pool. Possibly
both, if there's a long/short distinction like there is for strings.

> would not slow down the gc for _other_ objects

It doesn't slow it down for INDIVIDUAL other objects, but cleaning up
is still extra work the gc has to do when it's time to reap, which
makes those gc sweeps take longer.

> and would not
> create hash strings and I never said anything about a __hash metamethod
> (which would be useless without a __collision_check metamethod btw or you're
> back to square one, but I don't want any of that anyway).

I didn't say YOU said anything about a __hash metamethod; that would
just be one possible implementation. (Also, you wouldn't need a
__collision_check metamethod anyway; it would just be __eq (which
tuples would want to have regardless).)

I also didn't say anything about hash STRINGS, just hashes in general
-- you either need to memoize them (increasing the memory footprint of
each tuple) or calculate them every time (makes table accesses more
expensive). Whether these are stored as Lua strings, as char arrays,
or as ints doesn't change this fact.

Furthermore, I didn't even mention __hash in that last paragraph about
tradeoffs. Besides, in-core code or out-of-core code would have the
same choice to make (hard-coded hashing vs. metamethod).

All in all: To me it strongly feels like tuples would be better served
by library code (or maybe a power patch) than core code. They aren't
broadly used enough to warrant the corresponding increase in Lua's
footprint, and the people who would benefit from its existence can use
a library or patch.

/s/ Adam