[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: packed structures
- From: "Pierre Chapuis" <catwell@...>
- Date: Mon, 11 Oct 2021 19:36:34 +0200
I would need more tests, including on hardware I no longer have access to,
but I have had experience with packed structs and unaligned accesses and
in general you do not want them unless you have a very good reason.
- Not all ARM CPUs support them. On some 32 bit ARM CPUs that don't
you will get SIGBUS on Android.
- With several CPU / libc combinations you will get a significant
performance penalty. Basically there are architectures where the
CPU handles the unaligned access and it causes a small
performance hit that can be compensated by cache locality, but
on other architectures it raises an exception that is caught by
a handler in the libc, and that can be *super slow*.
--
Pierre Chapuis
On Mon, Oct 11, 2021, at 18:39, Roberto Ierusalimschy wrote:
> Hugo Gualandi came with the idea of using a packed structure to store
> Lua values. Intel CPUs (and it seems ARMs too) can work with unaligned
> data (or aligned with weaker boundaries) and, at least for some
> architectures, with very small (or even none) performance penalties.
>
> As a very fast check, I simply changed the following line in lobject.h:
>
> -typedef struct TValue {
> +typedef struct __attribute__((packed)) TValue {
> TValuefields;
> } TValue;
>
> This is valid in gcc and clang. (It gives one warning in ltable.c which
> for now I am ignoring. It is a trivial change to correct that: pass the
> second parameter of 'mainposition' by value instead of by reference.)
>
> I quickly tested that in two Intel i7. As expected, memory use
> by arrays is cut by almost half (9/16). Maybe unexpected, I did
> not see any relevant performance penalties at all. (In a few
> benchmarks, performance even improved, probably because there
> is less memory trafic.)
>
> It would be good to know how this change works in other architectures.
>
> -- Roberto