Re: Some questions on Lua's design

On Mon, Apr 1, 2013 at 6:33 PM, Sean Conner <sean@conman.org> wrote:

It was thus said that the Great Marco Salamone once stated:
>

> Sean,

> > You might think that a ton of bits are wasted, but that's the nature of
> > C. C guarentees that the fields will appear in the order given. A lot
> > of CPUs can't reference 32-bit quantities on odd addresses (or even on
> > addresses not divisible by 4). So padding is a given.
>
> In regards to implementation, can you not wrap structs in unions? Not sure
> about the following semantics,
> struct foo { bool b0 : 1; bool b1:1; bool b2:1; /*etc...*/ };
> typedef union { struct foo bar; } LByte8;
> Obviously you still need to align it to bytes, but you could concievably
> make your own tables of bits and manually manage pointers/data within it.

In C, there is the concept of "defined," which means it applies to all
compilers regardless of architecture, "undefined," which means the behavior
is not described and can be different across compilers (even different
versions of the same compiler on the same hardware) and "implementation
dependent" which means a compiler is free to pick among different documented
behaviors depending for whatever reason (mostly ease of implementation and
performance).

For example, that an unsigned int will wrap around to 0 when you increment
the largest value is "defined"; when you increment a signed int past its
largest value is "undefinded" and anything can happen when the condition
happens (the program can crash, monkeys fly out of an orifice on the
computer, dogs and cats will start living together, whatever) and the size
of an integer (both signed and unsigned) is "implementation dependent"---it
can be of any size as long as it's at least 16 bits, and is no longer than a
long int.

So, with that out of the way, the following is legal C:

union foo
{
struct
{
int field1 : 5;
unsigned int field2 : 2;
unsigned int field3 : 1;
int field4 : 4;
int field5 : 4;
} fields;
int all;
};

union foo x;

Now, the restrictions (some of which I neglected to mention before):

1. You can only declare bit fields as "int", however, they can be
signed (default, like 'int') or unsigned.

2. You cannot take the address of a bit field. You can, however,
take the address of the structure that has bit fields.

3. The amount of memory a bitfield takes is implementation
dependent.

4. "int x:1" is a minefield---it borders not only on "implementation
dependent" but also on "undefined" behavior. The best practice
for 1 bit fields is to explicitely mark them as "unsigned".

5. The ordering of the bits in a struct with bitfields is
"implementation dependent." In a raw memory dump of x, field1 may
appear first, or field5. Setting field5 to -1 might mean all is
a large negative number, or 15 or some intermediate positive
value. It all depends on how the bitfields are actually laid
out.

As for code generated, of course this is going to be CPU dependent, but
something like:

x.fields.field4 = 8;

could generate something like (in C, to avoid bringing assembly into
this):

x.all = (x.all & 0xFFFFF0FF) | 0x0800;

(because it's a constant, the compiler can avoid doing shifts), but this:

x.fields.field4 = y;

now becomes:

x.all = (x.all & 0xFFFFF0FF) | (y & 0x0F) << 8;

Don't hold me to these actual values because the actual
implementation may be different than what I'm assuming. This is for
instruction purposes only.

But yes, bit fields require a bunch of bit fiddling to use. And even
when it might make sense to use (such as a binary network protocol) I don't
bother with them as they are highly implementation dependent.

> My understanding of unions/structs is that the given example doesn't
> produce any overhead and works at compile-time. Though ofc the extra memory
> for pointers and cycles for bit-masking might contradict any performance
> gains, but you could probably even make custom pointer tables that consume
> less memory (relative to the region of bits that you're dealing with,
> specifying length and position) and access the table at constant time-- the
> only issue is the cycle-cost of bit-masking, which could be mitigated with
> some lazy compression. Or is the way C already does it just infinitely
> superior?

My suggestion to you---learn a CPU architecture. If you want a simple
one, I can recommend the 6502, 8080, Z80 or 6809. These are all 8-bit CPUs
that are easy to understand (compared to a modern CPU like the Pentium which
has a ton of baggage that can safely be ignored but until you know the
Pentium its hard to know which bits can be ignored) and there exist plenty
of information and emulators for these.

> > Can you address individual bits? Yes ... but at a cost. A cost that
> > most people aren't willing to pay actually.
>
> I'm pretty clueless on this-- are the costs any different from what I
> described above? Are these costs always stupid? I get the sensation that
> what I'm thinking about here is just moronic... or rather, not applicable.
> It's just a scheme for data compression without compromising access time...
> though the overhead to make it work, if there are a lot of new mask-checks,
> wouldn't be acceptable at runtime, no? Ehh- it's just too domain-specific I
> guess.

Okay, let's use a 32-bit system for this. A 32-bit CPU will have a number
of registers (think of these as named memory locations inside the CPU) to
work with, and these registers will be 32 bits in size. Also, the address
space will also be 32-bits in size, meaning you can address 4,294,967,295
eight bit quantities (bytes). If you want to address down to an individual
bit, you'll need three more bits per address, or 35 bits. But registers are
limited to 32 bits. So, you'll need at least two registers to address a
single bit in the computer.

And for storing that 35 bit address? Well, given that memory is addressed
in 8 bit quantities, you are now looking at at least 5 bytes per address
instead of the normal 4. Another wrinkle though---some CPUs are unable to
load 32 bit quanties at odd byte addresses (and come CPUs require a 32 bit
quantity to be aligned at an address evenly divisible by 4) so a "portable"
version you are looking at 8 bytes to store a 35-bit address. The more you
try to efficiently use bits, the more overhead you'll have.

-spc