lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


David Kastrup wrote:

Instead of the string following the string record, we will have the
following:

a) a pointer to the memory containing the string
b) a pointer to a function to call with the address in a) when Lua no
   longer needs the string

Yes, that's reasonable. But note that it increases memory consumption
significantly for short strings.

There would then be two choices: either all strings would be stored
in this fashion, or only non-interned strings would be stored this way.
In the first case, all strings would pay the memory and performance
penalties (twice as many malloc()'s and free()'s, which is not cheap).
In the second case, at every point where the actual address of a string
is needed, an additional test would need to be inserted, something like
this:

  static inline const char *strptr (TString *ts) {
    return (ts->ttype == LUA_TSTRING) ? (const char *)(ts + 1)
                                      : ((TNoninterString *)ts)->str;
  }

Perhaps that's not going to slow things down too much, I don't know.
But unpredictable branches are generally bad.

<snip>

Now what semantics seem useful for such lightweight strings?  If
hashing is supposed to be lazy, it means that identity of strings
can't be established when the variable is created (and we probably
should not move it afterwards, I guess judging from what I understand
from the design documents and discussions here).

"value" not "variable". Yes, it would be bad to move it. See below.

  When using strings
as indexes, we should for that reason use their string hash for
establishing the proper hash chain, not the variable address itself.

That's currently the case, anyway. But note the difference: the key
equality test is currently simply a pointer comparison, not a
memcmp().


Now if we walk through some hash chain and find a value comparing
equal, we can replace the pointer of this lightweight string to the
address where the other string is.  In that way, future comparisons
for equality can be resolved quickly, as strings sharing the same data
address and length can't be different.

Yes, but that doesn't address the issue I outlined in my previous
message. If you change the address of a string, then any function
which happens to be holding the address of the string ends up with
a dangling pointer. Since many useful C functions which hold
addresses of the string also do callbacks of Lua functions, it
becomes impossible to guarantee the contract that the string
pointer will remain valid through the lifetime of the C function
call.

Remember that Lua functions can be invoked via metamethods as well.
So any use of lua_gettable, lua_settable, or functions which might
use those APIs could also invalidate the address of a string.
Effectively, the C function would need to revalidate the string
pointer on practically every use. That would, in my opinion, make
the Lua API much harder to use.