lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Dec 20, 2010 at 05:20:22PM +0100, Axel Kittenberger wrote:
> You are right classic C does theoretically not define the width of
> char, other than its fixed on a system. However, so much code
> supposes it to be an octet, no sane compiler will change that.

I have read that there are current/modern embedded or DSP compilers
that use a 16-bit char type, but I don't have any direct experience
with them so I can't name any actual examples.

> I don't follow C standards, but I recall some recent gave in on the
> defacto unchangeable octetness of char and made it standard, but
> don't quote me on it.

C89 provides CHAR_BIT in <limits.h> to tell you how many bits are in a
char.  It also defines sizeof (char) to be 1, regardless of CHAR_BIT.
CHAR_BIT is required to be _at least_ 8, but there is no upper bound.

C99 adds the concept of padding bits (such as parity bits) within
integer types, though it does not permit unsigned char to have any.
It still allows CHAR_BIT to be greater than 8.  For cases where you
need a type that is exactly 8 bits in size, it adds the <stdint.h>
typedefs uint8_t and int8_t, _but_ they are optional and do not have
to be defined if the architecture has no native 8-bit type.

As of the 2010-12-02 draft C Standard (n1547 over at
www.open-std.org), the C99 rules appear to still be in place.
CHAR_BIT is still defined as "greater than or equal to 8", and uint8_t
and int8_t are still optional.  There is a bunch of new stuff
regarding Unicode strings including new char16_t and char32_t
typedefs, but note that they are defined as being at _least_ 16 and 32
bits rather than _exactly_ those sizes.  UTF-8 string literals are
also supported and are stored as char arrays, presumably with each
char holding a single octet of the encoding but I haven't looked at it
closely enough to be certain.

                                                  -Dave Dodge