[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Suggestion : Use unique string type instead of two (short and long)
- From: 云风 <cloudwu@...>
- Date: Tue, 18 Jun 2019 00:57:38 +0800
在 2019年6月17日,23:41,Roberto Ierusalimschy <roberto@inf.puc-rio.br> 写道:
>> Maybe a 32bit id is enough, because we can rearrange the id during the gc process. It’s a little complicated, but I think it’s possible.
>
> Can you be more specific? It is easy for the GC to set a watermark
> (e.g., to keep the highest/lower id still in use), but that does
> not guarantee anything. We can also renumber all strings, paying
> the price for a little overhead in the first comparisons after each
> GC cycle.
My idea is use a fixed size hash cache. For example , there are 10 slots .
During the mark stage, we put the highest id and lowest id into the slots.
For example
high: 80 81 92 93 74 95 86 97 98 89
low: 00 11 02 03 14 05 16 27 28 09
It means the highest id we used is 89-98, and 01, 04, 06-08 can be reused. We can always find the highest id we still used, and a few unused id .
And then, we can remap 92, 93, 95, 97, 98 to 01 04 06 07 08 during the sweep stage.
So that , we can recycle a few id for each gc cycle.
For the length, we can still store 1 byte for small string length, if the string longer than 255 bytes, store 0xff and an additional size_t.