lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Rici Lake <lua@ricilake.net> writes:

> David Kastrup wrote:
>
>> Instead of the string following the string record, we will have the
>> following:
>>
>> a) a pointer to the memory containing the string
>> b) a pointer to a function to call with the address in a) when Lua no
>>    longer needs the string
>
> Yes, that's reasonable. But note that it increases memory
> consumption significantly for short strings.

Not that much if their content is not copied from the caller's address
space.  In that case, the allocated minimum amount of data after
alignment might be 16 bytes, anyway.

But the added consumption will be particularly noticeable when no
interning/string unification occurs for small strings which would
require no additional memory at all when found identical to an already
interned string.

So this would, at least, argue for having the choice of
intern/nointern to be specified by the caller.

> There would then be two choices: either all strings would be stored
> in this fashion, or only non-interned strings would be stored this
> way.  In the first case, all strings would pay the memory and
> performance penalties (twice as many malloc()'s and free()'s, which
> is not cheap).  In the second case, at every point where the actual
> address of a string is needed, an additional test would need to be
> inserted, something like this:
>
>   static inline const char *strptr (TString *ts) {
>     return (ts->ttype == LUA_TSTRING) ? (const char *)(ts + 1)
>                                       : ((TNoninterString *)ts)->str;
>   }
>
> Perhaps that's not going to slow things down too much, I don't know.
> But unpredictable branches are generally bad.

When uninterned strings don't get used, the branch will be
predictable.  When they do get used a significant number of times,
presumably the savings would be worth the loss in branch prediction.

> But note the difference: the key equality test is currently simply a
> pointer comparison, not a memcmp().

Yes.  I noted that.

>> Now if we walk through some hash chain and find a value comparing
>> equal, we can replace the pointer of this lightweight string to the
>> address where the other string is.  In that way, future comparisons
>> for equality can be resolved quickly, as strings sharing the same
>> data address and length can't be different.
>
> Yes, but that doesn't address the issue I outlined in my previous
> message. If you change the address of a string, then any function
> which happens to be holding the address of the string ends up with a
> dangling pointer. Since many useful C functions which hold addresses
> of the string also do callbacks of Lua functions, it becomes
> impossible to guarantee the contract that the string pointer will
> remain valid through the lifetime of the C function call.

It depends on whether we are talking about the address of the variable
value, or the address of the string data.  The former will remain
intact, the latter will get decoupled from the former.  When
lightweight strings get created by a special API call, the C routine
should know that it needs to look up the actual data pointer in the
variable record (which will not change its address).

> Remember that Lua functions can be invoked via metamethods as well.
> So any use of lua_gettable, lua_settable, or functions which might
> use those APIs could also invalidate the address of a string.
> Effectively, the C function would need to revalidate the string
> pointer on practically every use. That would, in my opinion, make
> the Lua API much harder to use.

One has to use an idiom like var->ptr[i++] instead of *p++ across
calls.  In comparison to the cost of the call, the price in
performance should be negligible.  There is an impact in readability,
but then using pointers instead of indexing is arguably not always an
improvement in legibility, either.

There is another option _if_ one decides to make every string (and not
just specific lightweight/uninterned strings) hold its content address
and does not guarantee its constancy over its life time: this could
allow for compacting garbage collection of string space.

So there are a number of possible models.  In any case

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum