[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Suggestion : Use unique string type instead of two (short and long)
- From: 云风 <cloudwu@...>
- Date: Tue, 18 Jun 2019 06:26:51 +0800
在 2019年6月18日,03:42,Roberto Ierusalimschy <roberto@inf.puc-rio.br> 写道:
>
>> And then, we can remap 92, 93, 95, 97, 98 to 01 04 06 07 08 during the sweep stage.
>
> I don't think we can do that. We remap one string from 92 to 01, but
> there may be other equal strings also numbered 92, which were not
> changed yet. (Remember that the collector is incremental.) Then, we
> compare this first string with one still with the original number, and it
> will go back to 92.
We need use the lower id after comparison , because choose the older one would be better , maybe there are many older different string objects with the same value , and the newer string object should use the older id, otherwise all the older objects’ id should change.
So the other 92 can be changed into 01 in this case.
And I have a new idea :
We can separate 32bit into two id space , one is 0~2^31 and another is -2^31~ -1 .
At first , we use the positive part , and we choose smaller id after comparing. when the id exceed 2^31, we switch to negative part.
At this time, we renumber the id in sweep stage of gc just by allocate a new negative id for each string alive.
After renumber, the rule is changed to choose bigger id after comparing until we need to switch id space next time.
2^31 is a very large range, so we seldom renumber .
>
>
>> For the length, we can still store 1 byte for small string length, if the string longer than 255 bytes, store 0xff and an additional size_t.
>
> But then we are back with two kinds of strings again. Several of the
> gains you suggested (e.g, a faster lua_getfield) go away.
>
There is no two kinds of string, it is just a variable length number for string length. we need the length only in first comparison.
And I think lua_getfield would be faster because we don’t need interning and can use the string in C side directly.