lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 05/09/2011 8.25, Josh Simmons wrote:
On Mon, Sep 5, 2011 at 3:34 PM, Axel Kittenberger<axkibe@gmail.com>  wrote:
On Mon, Sep 5, 2011 at 12:25 AM, Josh Simmons<simmons.44@gmail.com>  wrote:
and there's no concept of a byte in C.

No. There is. Citing from the C89 Draft:

  * Byte --- the unit of data storage in the execution environment
   large enough to hold any member of the basic character set of the
   execution environment.  It shall be possible to express the address of
   each individual byte of an object uniquely.  A byte is composed of a
   contiguous sequence of bits, the number of which is
   implementation-defined.  The least significant bit is called the
   low-order bit; the most significant bit is called the high-order bit.

C99:
byte
addressable unit of data storage large enough to hold any member of
the basic character
set of the execution environment

Both then distinguish between single-byte and multi-byte characters.



I knew that posting without checking my facts would come back to bite me. :)

So to backflip, I agree, I don't like the use of character at all
since it's so heavily loaded with the idea of text and unicode
especially. However maybe octet is a better terminology than byte.

Well, after I was rightfully corrected by Gregory I rechecked my facts :-) : "octet" means unambiguously 8-bit (according to Wikipedia the term was coined just to avoid the ambiguity of the term "byte").

Therefore I fear that Lua relies on C chars and, indirectly, C byte definition. So there is no guarantee that string characters are actually octets, even if the docs states Lua strings are 8-bit clean.

I have only skimmed over Lua source, so I may be utterly wrong, but I got the impression that no assumption is made on a char being 8-bit (it may well be 16-bit and everything would be ok). The only assumption is that a char is *at least* 8-bit.

BTW, according to the snippets of the C standards reported by Axel, this assumption may prove wrong, since it seems that the standard doesn't guarantee that a byte or a char can hold at least 8-bit, but this is purely theoretical probably (how many systems nowadays have bytes with 7 or less bits? Are there any?)

-- Lorenzo