[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: table.concat is inefficient for very long strings
- From: Sean Conner <sean@...>
- Date: Thu, 21 Jul 2022 22:14:45 -0400
It was thus said that the Great Scott Morgan once stated:
> On 21/07/2022 22:52, Gé Weijers wrote:
> >With the doubling strategy you only copy twice the size of the
> >maximum string length, or 8GB for a 4GB string.
>
>
> Would memory fragmentation also be an issue with multi-GB allocations,
> even with efficient buffer managment?
It really depends upon the memory allocation method on a given operating
system. For instance, on Linux with the GNU libc it overcommits
memory---that is, malloc() will probably never return NULL as the underlying
system allocates address space, not actual memory. Memory is only actually
"allocated" when you write to memory (it will find and map an actual memory
page). That's one issue.
The other one is resizing an existing block of memory. The C standard
states this:
The realloc function deallocates the old object pointed to by ptr
and returns a pointer to a new object that has the size specified by
size. The contents of the new object shall be the same as that of
the old object prior to deallocation, up to the lesser of the new
and old sizes. Any bytes in the new object beyond the size of the
old object have indeterminate values.
Basically, one should treat the pointer returned from realloc() as a
different one from the one passed in. In reality, an "optimization" is to
attempt to grow the block in place and return back the original pointer, and
only allocting new memory when the old one can't be grown (NOTE: one
shouldn't assume the new pointer and the old are the same). Also note that
only the existing contents of the old memory are copied---any unused space
is not initialized so new memory (assuming the first paragraph of my reply
holds) isn't mapped until used.
I wrote a quick program for 64b Linux that continuously grows a block of
memory with twice the size until 1/2G. I don't write into the memory, just
allocate it. I also check the returned pointer to see if it's the same:
1 old
2 new
4 new
8 new
16 new
32 old
64 new
128 new
256 new
512 new
1024 new
2048 new
4096 new
8192 new
16384 new
32768 new
65536 new
131072 new
262144 old
524288 old
1048576 old
2097152 old
4194304 old
8388608 old
16777216 old
33554432 old
67108864 old
134217728 old
268435456 old
536870912 old
At this point, the allocated address space has 527044 pages, but is only
using 1668. And as can be seen, once the size got past a certain point, the
block was able to be grown in place. Once I write into all the allocated
memory, the resident size increases to 525624.
And to show differences in memory allocation schemes, here's one from a
slightly older version of Mac OS-X:
1 old
2 new
4 new
8 new
16 new
32 old
64 new
128 new
256 new
512 old
1024 old
2048 new
4096 old
8192 old
16384 old
32768 old
65536 old
131072 old
262144 old
524288 new
1048576 new
2097152 new
4194304 new
8388608 new
16777216 new
33554432 new
67108864 new
134217728 new
268435456 new
536870912 new
And again, the total address space allocated is large (2957168---I'm
unsure of the units here) but the actual space is tiny (748). And here, we
seem to always allocate a new block past a certain size.
-spc