lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Mar 8, 2010 at 5:57 PM, David Manura <dm.lua@math2.org> wrote:
> On Fri, Mar 5, 2010 at 2:37 PM, Mark Szpakowski wrote:
>> I notice that the C structs library, downloadable from and described at
>> http://www.inf.puc-rio.br/~roberto/struct/, lists 3 functions in its API
>> (stuct.pack, struct.unpack, and struct.size). However, struct.size is
>> missing:
>
> Appears so.  It was also asked here:
>
>  http://lua-users.org/lists/lua-l/2009-10/msg00472.html .
>  http://lua-users.org/lists/lua-l/2009-03/msg00382.html
>
> BTW, new page: http://lua-users.org/wiki/StructurePacking .

Maybe this is useful to someone. Unable to find a succinct summary of
the differences between struct and pack, and frustrated by the
mostly-but-not-quite overlapping feature sets, I wrote up this
description of the choices.

YMMV, but for background, we currently are using pack, chosen at
random a few years ago, and I am absolutely not looking for some
heavyweight object-to-binary mapping library.


# Comparison of features for various pack/unpack libraries.


## pack:

+ has a repetition operator
+ has support for lua_Number (though sensible people leave this as
double, so this
  isn't normally useful)
+ has "=" to reset to native endianness

- doesn't support platform independent sized integers
- doesn't support size_t
- doesn't support pascal strings with 4-byte sizes on 64-bit architectures
- doesn't support padding
- doesn't support alignment
- documentation is poor (split across the readme and source file,
doesn't mention
  important details such as what the size of a "word" is, or which characters
  are ignored in the format string)
- inconsistent and hard to remember format characters (upper case is
unsigned, except
  lower case "b"; the pascal string problems)

## struct:

+ supports platform independent sized integers
+ elegant support for all pascal strings

- doesn't support a repetition operator
- doesn't support size_t
- doesn't support "," as an ignored character in the format string
- documentation is non-existent

## lunary:

Lots of code to make sockets, files, and strings look similar, which
we don't need.

Complex data structure support.

Mid-size code base, but not-trivial to read (like struct and pack).

Nothing about it really caught my eye as being enough of an improvement.

## vstruct:

Lots of code, include a lexer/parser, compiled patterns, etc. Definitely not
trivial to read.

However, looked like it had good support for complex structures, and I
particularly liked its syntax for packing/unpacking directly into tables,
including tables with named fields. Also supports bit fields, including named
bitfields.

Its a big step to move to this, but it might be worth considering.

Its almost completely native lua, though whether that makes a difference to us
is not clear, we usually run too fast, not too slow.


# What we could do:

Patch pack to use unsigned int instead of size_t for "a".

Start using vstruct instead of pack.

Patch vstruct with the changes from the mailing list.

Add _ to vstruct to mean "underlying endianness"

Fix the struct unit tests, they assume long is 32-bit.


# Summary of pack and struct format support

"*" is "both are the same"

*: > - big endian
*: < - little endian

pack:   = - native

struct: ![num] - alignment
struct: x - padding

struct: b/B - signed/unsigned byte
pack:   c/b - ditto

*: h/H - signed/unsigned short
*: l/L - signed/unsigned long

pack:   i/I - signed/unsigned int
struct: i/In - signed/unsigned integer with size `n' (default is size of int)

*: f - float
*: d - double

pack:  n - a lua_Number (defaults to double, but theoretically can be different)

*: ' ' - ignored

pack: ',' - ignored

struct: s - zero-terminated string
pack:   z - ditto


pack:   An - on write, n is repetition
pack:   An - on read, n is width
pack:   p		/* string preceded by length byte */
pack:   P		/* string preceded by length short */
pack:   a		/* string preceded by length size_t */

struct: cn - sequence of `n' chars (from/to a string); when packing, n==0 means
        the whole string; when unpacking, n==0 means use the previous
        read number as the string length


    Pack's use of size_t is not good, we will change it to int, and maybe use
    'Z' for size_t pascal strings. struct's approach is better, it can do any
    size.


pack: <fmt>n - same as <fmt> repeated n times, except "A"


# Actual sizes

Note that long and size_t tend to be the largest size, 32 or 64 bit depending
on system, so uses of them for sizes in networking code are not portable.


64-bit:
    % ./bin/sz
     8 bytes void* (unsigned)
     8 bytes function* (unsigned)
     1 bytes char
     2 bytes short
     4 bytes int
     8 bytes long
     8 bytes long long
     4 bytes float
     4 bytes float
     8 bytes double
    16 bytes long double
     8 bytes time_t
     8 bytes suseconds_t
     4 bytes pid_t
     4 bytes wchar_t
     8 bytes size_t (unsigned)
     8 bytes ptrdiff_t
     8 bytes ssize_t
     8 bytes intmax_t
     8 bytes uintmax_t (unsigned)

32-bit:

     4 bytes void* (unsigned)
     4 bytes function* (unsigned)
     1 bytes char
     2 bytes short
     4 bytes int
     4 bytes long
     8 bytes long long
     4 bytes float
     4 bytes float
     8 bytes double
    12 bytes long double
     4 bytes time_t
     4 bytes suseconds_t
     4 bytes pid_t
     4 bytes wchar_t
     4 bytes size_t (unsigned)
     4 bytes ptrdiff_t
     4 bytes ssize_t
     8 bytes intmax_t
     8 bytes uintmax_t (unsigned)


Above is from a utility I keep around:

#include<inttypes.h>
#include<stdio.h>
#include<stddef.h>
#include<stdlib.h>

#define SZ(x) printf("%2zd bytes " #x "%s\n", sizeof(x), (0 > (x) -1)
? "" : " (unsigned)")

typedef void function(void);

int main()
{
    SZ(void*);
    SZ(function*);
    SZ(char);
    SZ(short);
    SZ(int);
    SZ(long);
    SZ(long long);
    SZ(float);
    SZ(float);
    SZ(double);
    SZ(long double);
    SZ(time_t);
    SZ(suseconds_t);
    SZ(pid_t);
    SZ(wchar_t);
    SZ(size_t);
    SZ(ptrdiff_t);
    SZ(ssize_t);
    SZ(intmax_t);
    SZ(uintmax_t);

    return 0;
}