Re: string.pack with bit resolution

lua-l archive
[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
Subject: Re: string.pack with bit resolution
From: Sean Conner <sean@...>
Date: Thu, 17 Oct 2019 03:36:55 -0400
It was thus said that the Great bil til once stated:
> One further offer to make:

  That you will make the necessary changes to string.pack()/string.unpack()
(or an entirely new module) for people to use and comment on?  If the demand
is that high and you you know what it wanted, I would think you would be the
best person to implement this.

  Or is that just wishful thinking on my part?

> I would skip the signed bit numbers. Signed integers always have the
> slightly awkward property, that there is one negative number more than the
> positives,

  Lua is based upon C.  And C allows the representation of negative integers
to be of sign magnitude, 1s complement, or 2s complement.  The range of an
8-bit value for each of these are:

	sign magnitude:	-127 .. 127
	1s complement:  -127 .. 127
	2s complement:  -128 .. 127

  The C standard (and I'm using the C89 standard here, which Lua adheres to
most) only gives symetrical ranges for each integer.  Section 5.2.4.2.1
states:

	Their implementation- delined values shall be equal or greater in
	magnitude (absolute value) to those shown. with the same sign.

	... 

	-- minimum value for an object of type int
	   INT_MIN	-32767

	-- maximum value for an object of type int
	   MIN_MAX	+32767

	...

  They can, of course, be bigger.

  Granted, most systems today are 2s complement, but I have recently come
across a C compiler, *still commerically avilable* for a sign magnitude
system.  

> and this "negative surplus" is this "wicked 0x80", which can even
> lead to crazy nightmares for experienced programmers. 

  I'm not aware of any crazy nightmares for experienced programmers.  Novice
ones, yes.  But perhaps I was fortunate enough to learn assembly first and
that on every 2s complement system I've learned assembly for (and that's
pretty much all the systems I've ever come across) state:

	NEG	set overflow flag if input is $80 (or $8000 or $80000000
		depending upon size of operand).

  But in practice I've never had a real issue with this.

> In case of chars, this
> is only a 1% defect, but in case of a2, this is a 25% defect, which is
> really hard to explain to any user.

  And a 50% defect in the case of "a1".  But even there, on page 150 of my
copy of K&R C (the C Bible, and remember, Lua is based upon C):

	struct {
		unsigned int is_keyword : 1;
		unsigned int is_extern  : 1;
		unsigned int is_static  : 1;
	} flags;

	This defines a variable called flags that contains three 1-bit
	fields.  The number following the colon represents the field width
	in bits.  The fields are declared unsigned int to ensure that they
	are unsigned quantities.

> So let's concentrate on the positive world and on the unsigned bit numbers
> A, A2 nd A4.
> 
> Labeling bits with large letter is of course a half nightmare again. So if
> you are flexible enough for this, I would use the sing "." to mark a bit.
> And if I have convinced you already enough concerning the importance of bits
> in such packings, you could please also allow the short cuts : and | for
> 2-bit and 4-bit unsigned.
> 
> So then 3 more lines in your format list:
> .     a bit (value 0/nil/false, 1/true)
> .[n]  an (unsigned) bit number with n bits (value 0/nil/false, 1/true ...
> 2^n-1)
> :     an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3)  
> |     an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3)  
> 
> If possible also the 2 following additonal float types:
> r     short float
> D     long double
> 
> (remark: long double is an ansi C standard type - this in ANY case needs to
> be somehow in the list ... the 64bit fans otherwise will kill you...)
> 
> As separators you have already allowed spaces, some people for sure want
> colons, I would propose also single quotes / hyphens '. So then the last
> line of format list should read: 
> " ", ",", "'": These 3 charakters (space, colon, single quote) are ignored
> 
> Then you could write nice formats to pack/unpack String into its bits like
> this (e. 1 Byte, 1short, 1 int):
> "....'...." or "||"
> "....'....'....'...." or "||'||"
> "||||'||||"
> (of course you should also allow "8." or "16." or "32." ... but the above
> notaions really look nice, even deigners would like this, I hope)

  At this point, the Erlang bit syntax is looking better and better.  An
example:

	<<IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16,
	  ID:16, Flgs:3, FragOff:13,
	  TTL:8, Proto:8, HdrChksum:16,
	  SrcIP:32,
	  DestIP:32>>

  It will even allow you to specify things like signedness, endianess:

	X:6/little-signed-integer

(more about this here: http://erlang.org/doc/programming_examples/bit_syntax.html)

  Put that into a string, and your packing/unpacking module could even
return a table with the fields prenamed and everything.

> A further VERY nice application would appear, if you would allow to specify
> the n in the format list. You use the parameter n already for lua_Number
> (SIDE REMARK1: which makes sense - just please specify how you do this - I

 ... [ snip ] ...


  Or one could use a pre-existing module to serialize data.  I wrote a very
extensive CBOR (Cocise Binary Object Representation, RFC-7049)
implementation for Lua:

	https://github.com/spc476/CBOR

and I hear CBOR is very popular among the IoT crowd for its compactness of
representation.  And it's not like it's hard to use.

> assume you need _tt and then the native byte number (so in LUA_32BITS this
> would be 4 byte for int or 4 byte for float) 

  Assuming the platform in question uses 4-byte ints and IEEE-754 floating
point.  Again, you are making assumptions that Lua does not.

> - you have to specify how long
> the _tt is - I assume 1 byte is fine for this, and maybe zero for float and
> 1 for integer and 2 for boolean or s - this of course really MUST be
> specified exactly in the descirption of pack / unpack. You could e. g. also
> use _tt marking 0 for boolean with 1 byte, 4 for int32, 5 for float32, 8 for
> int64, 9 for double64, maybe also FE for pointer32 and FF for pointer64").
> ... so n is given away already... then maybe use #).
> (SIDE REMARK2: The specifier j and J is stupid - this makes no sense ... if
> somebody wants an integer in this list, please i should be used)

  j and J give you at least a 64-bit int.  i and I only give you the native
integer size, which can be 16-bit or larger.

> (SIDE REMARK3: In the format list for h and l you write "native size" - this
> is stupid in my eyes, please change to "2 bytes" for h, and "4 bytes" for l,
> or do you know some other native size for short and long??? - only
> lua_Integer and lua_Number has native size, as I see it)

  I have actually used the "native size" for a project recently, to read
native binary integers written by a C program.  You might not find a use for
the "native sizes" but that doesn't mean others won't.
  
> So for the extension I want to describe in the following, please two further
> line in your format list:

  Have you thought of maybe implementing this yourself?

  ... [ snip ] ...

> (but please do not come with the argument, that these things bloat up the c
> code for string.pack / string.unpack very much - this I do not believe you
> ... these are just some minor additons in c code, which make these functions
> MUCH more flexible in use)

  Which means this should be easy to implement, right?

  -spc (Right?)
Follow-Ups:
- Re: string.pack with bit resolution, bil til
References:
- string.pack with bit resolution, bil til
- Re: string.pack with bit resolution, Roberto Ierusalimschy
- Re: string.pack with bit resolution, bil til
- Re: string.pack with bit resolution, bil til
Prev by Date: Re: string.pack with bit resolution
Next by Date: Re: string.pack with bit resolution
Previous by thread: Re: string.pack with bit resolution
Next by thread: Re: string.pack with bit resolution
Index(es):
- Date
- Thread