lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I have an (older) low end ARM processor that used to generate an alignment exception when reading unaligned data.
I tried for a day to find the exact model, but I did not reach the fellow from hardware development - I can give this information later.
It was possible to implement the unaligned access in the interrupt handler in software, with some performance costs for this operation.
Since the unaligned access was a rare event, the overall performance cost was not too high.

As to my knowledge, PowerPC 64 bit will also generate alignment exceptions that need to be handled in software.

My usual solution to have close data packing and alignment is to sort all structures by size from big to small.
This will lead to perfect packing (or almost perfect for nested structures) - with the performance benefit of a structure probably fitting into one cache line.
And it will also completely avoid unaligned access.


On Tue, Oct 12, 2021 at 11:49 AM Flyer31 Test <flyer31@googlemail.com> wrote:
Hi, sorry for late answer, I discovered your post only just now.

On my Cortex M4 controller STM32G473 (Nucleo board) with Keil ARMCC
compiler V5.05 in Lua32 mode this changes the following:
Original code (without packed):
- Program Size: Code=98340 RO-data="" RW-data="" ZI-data=""> - During execution for some typical 100 line Lua program (using
exclusively baselib), the Allocator shows max 14872 bytes (291
Allocation events)

After your proposed change to packed (but Keil will ask me to use
attribute packed also for union Value, which of course should be no
problem then...):
All running nicely so far, but not much storage improvement:
- Program Size: Code=98620 RO-data="" RW-data="" ZI-data=""> - During execution for this same "typical 100 line Lua program (using
exclusively baselib)",  the Allocator shows max 14528 bytes (291
Allocation events)

So this is only an Alloc benefit of 350 Bytes or 2.4% ...

... but I assume this is mainly due to Lua32? Of course for Lua64 the
effect will be much more pronounced...

But thank you for thinking about this ... just I assume that if it
does not have much effect for Lua32, then the question does not make
TOO much sense, as those people who have RAM problems usually will be
the Controller programmers who anyway use Lua32... .

The thing which really pinched me strongly was the string  usage of
Lua ... I had to use my own "fix char buffers" now to get stable
working with less than 20-30kB total RAM for alloc... . If I use the
typical string library alloc style with  "Lua .. concat" operator and
"Lua string.sub" function for some extended string cutting/shifting
during communications / programming events, then 20-30kB RAM are just
too few  (also 60kB...). On Windows PC this "of course" works without
any problem (I even checked this, at first I thought maybe I have some
general Lua programming problem here..), but my small Controller
system will run into memory problems too soon ... so now I took out
stringlib and for any "heavy" string shifting things in my
communication module use my self-programmed buffers with fix size and
somehow "own fix size buffer library" ... and I hope very much that
this will work fine now.... I will report later... . If it works, it
would be possibly nice if the Lua "goverment people" :) could think
about implementing something like such "fix string buffers" more basic
into Lua, so that this is somehow more "standardized" then ... I think
this should not be too much change hopefully...

 (also would be nice to use strings with length of 1-4 or at least 1-3
chars in "number like style", so without ANY alloc in Lua32, this
really would already be a tremendous RAM gain in my controller
software, as I often use such smaller strings for checking / option
definitions... if you need 30-50 bytes alloc to create a string of 1-2
chars, this really is quite nerving for a small Controller
application).


On Mon, Oct 11, 2021 at 6:40 PM Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
>
> Hugo Gualandi came with the idea of using a packed structure to store
> Lua values. Intel CPUs (and it seems ARMs too) can work with unaligned
> data (or aligned with weaker boundaries) and, at least for some
> architectures, with very small (or even none) performance penalties.
>
> As a very fast check, I simply changed the following line in lobject.h:
>
> -typedef struct TValue {
> +typedef struct __attribute__((packed)) TValue {
>    TValuefields;
>  } TValue;
>
> This is valid in gcc and clang.  (It gives one warning in ltable.c which
> for now I am ignoring. It is a trivial change to correct that: pass the
> second parameter of 'mainposition' by value instead of by reference.)
>
> I quickly tested that in two Intel i7. As expected, memory use
> by arrays is cut by almost half (9/16). Maybe unexpected, I did
> not see any relevant performance penalties at all. (In a few
> benchmarks, performance even improved, probably because there
> is less memory trafic.)
>
> It would be good to know how this change works in other architectures.
>
> -- Roberto