[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: packed structures
- From: Flyer31 Test <flyer31@...>
- Date: Tue, 12 Oct 2021 11:48:59 +0200
Hi, sorry for late answer, I discovered your post only just now.
On my Cortex M4 controller STM32G473 (Nucleo board) with Keil ARMCC
compiler V5.05 in Lua32 mode this changes the following:
Original code (without packed):
- Program Size: Code=98340 RO-data=14208 RW-data=240 ZI-data=73832
- During execution for some typical 100 line Lua program (using
exclusively baselib), the Allocator shows max 14872 bytes (291
Allocation events)
After your proposed change to packed (but Keil will ask me to use
attribute packed also for union Value, which of course should be no
problem then...):
All running nicely so far, but not much storage improvement:
- Program Size: Code=98620 RO-data=14208 RW-data=240 ZI-data=73832
- During execution for this same "typical 100 line Lua program (using
exclusively baselib)", the Allocator shows max 14528 bytes (291
Allocation events)
So this is only an Alloc benefit of 350 Bytes or 2.4% ...
... but I assume this is mainly due to Lua32? Of course for Lua64 the
effect will be much more pronounced...
But thank you for thinking about this ... just I assume that if it
does not have much effect for Lua32, then the question does not make
TOO much sense, as those people who have RAM problems usually will be
the Controller programmers who anyway use Lua32... .
The thing which really pinched me strongly was the string usage of
Lua ... I had to use my own "fix char buffers" now to get stable
working with less than 20-30kB total RAM for alloc... . If I use the
typical string library alloc style with "Lua .. concat" operator and
"Lua string.sub" function for some extended string cutting/shifting
during communications / programming events, then 20-30kB RAM are just
too few (also 60kB...). On Windows PC this "of course" works without
any problem (I even checked this, at first I thought maybe I have some
general Lua programming problem here..), but my small Controller
system will run into memory problems too soon ... so now I took out
stringlib and for any "heavy" string shifting things in my
communication module use my self-programmed buffers with fix size and
somehow "own fix size buffer library" ... and I hope very much that
this will work fine now.... I will report later... . If it works, it
would be possibly nice if the Lua "goverment people" :) could think
about implementing something like such "fix string buffers" more basic
into Lua, so that this is somehow more "standardized" then ... I think
this should not be too much change hopefully...
(also would be nice to use strings with length of 1-4 or at least 1-3
chars in "number like style", so without ANY alloc in Lua32, this
really would already be a tremendous RAM gain in my controller
software, as I often use such smaller strings for checking / option
definitions... if you need 30-50 bytes alloc to create a string of 1-2
chars, this really is quite nerving for a small Controller
application).
On Mon, Oct 11, 2021 at 6:40 PM Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
>
> Hugo Gualandi came with the idea of using a packed structure to store
> Lua values. Intel CPUs (and it seems ARMs too) can work with unaligned
> data (or aligned with weaker boundaries) and, at least for some
> architectures, with very small (or even none) performance penalties.
>
> As a very fast check, I simply changed the following line in lobject.h:
>
> -typedef struct TValue {
> +typedef struct __attribute__((packed)) TValue {
> TValuefields;
> } TValue;
>
> This is valid in gcc and clang. (It gives one warning in ltable.c which
> for now I am ignoring. It is a trivial change to correct that: pass the
> second parameter of 'mainposition' by value instead of by reference.)
>
> I quickly tested that in two Intel i7. As expected, memory use
> by arrays is cut by almost half (9/16). Maybe unexpected, I did
> not see any relevant performance penalties at all. (In a few
> benchmarks, performance even improved, probably because there
> is less memory trafic.)
>
> It would be good to know how this change works in other architectures.
>
> -- Roberto