[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: table.concat is inefficient for very long strings
- From: Sean Conner <sean@...>
- Date: Thu, 21 Jul 2022 22:14:45 -0400
It was thus said that the Great Scott Morgan once stated:
> On 21/07/2022 22:52, Gé Weijers wrote:
> >With the doubling strategy you only copy twice the size of the
> >maximum string length, or 8GB for a 4GB string.
> Would memory fragmentation also be an issue with multi-GB allocations,
> even with efficient buffer managment?
It really depends upon the memory allocation method on a given operating
system. For instance, on Linux with the GNU libc it overcommits
memory---that is, malloc() will probably never return NULL as the underlying
system allocates address space, not actual memory. Memory is only actually
"allocated" when you write to memory (it will find and map an actual memory
page). That's one issue.
The other one is resizing an existing block of memory. The C standard
The realloc function deallocates the old object pointed to by ptr
and returns a pointer to a new object that has the size specified by
size. The contents of the new object shall be the same as that of
the old object prior to deallocation, up to the lesser of the new
and old sizes. Any bytes in the new object beyond the size of the
old object have indeterminate values.
Basically, one should treat the pointer returned from realloc() as a
different one from the one passed in. In reality, an "optimization" is to
attempt to grow the block in place and return back the original pointer, and
only allocting new memory when the old one can't be grown (NOTE: one
shouldn't assume the new pointer and the old are the same). Also note that
only the existing contents of the old memory are copied---any unused space
is not initialized so new memory (assuming the first paragraph of my reply
holds) isn't mapped until used.
I wrote a quick program for 64b Linux that continuously grows a block of
memory with twice the size until 1/2G. I don't write into the memory, just
allocate it. I also check the returned pointer to see if it's the same:
At this point, the allocated address space has 527044 pages, but is only
using 1668. And as can be seen, once the size got past a certain point, the
block was able to be grown in place. Once I write into all the allocated
memory, the resident size increases to 525624.
And to show differences in memory allocation schemes, here's one from a
slightly older version of Mac OS-X:
And again, the total address space allocated is large (2957168---I'm
unsure of the units here) but the actual space is tiny (748). And here, we
seem to always allocate a new block past a certain size.