[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: string.pack with bit resolution
- From: Sean Conner <sean@...>
- Date: Thu, 17 Oct 2019 03:36:55 -0400
It was thus said that the Great bil til once stated:
> One further offer to make:
That you will make the necessary changes to string.pack()/string.unpack()
(or an entirely new module) for people to use and comment on? If the demand
is that high and you you know what it wanted, I would think you would be the
best person to implement this.
Or is that just wishful thinking on my part?
> I would skip the signed bit numbers. Signed integers always have the
> slightly awkward property, that there is one negative number more than the
> positives,
Lua is based upon C. And C allows the representation of negative integers
to be of sign magnitude, 1s complement, or 2s complement. The range of an
8-bit value for each of these are:
sign magnitude: -127 .. 127
1s complement: -127 .. 127
2s complement: -128 .. 127
The C standard (and I'm using the C89 standard here, which Lua adheres to
most) only gives symetrical ranges for each integer. Section 5.2.4.2.1
states:
Their implementation- delined values shall be equal or greater in
magnitude (absolute value) to those shown. with the same sign.
...
-- minimum value for an object of type int
INT_MIN -32767
-- maximum value for an object of type int
MIN_MAX +32767
...
They can, of course, be bigger.
Granted, most systems today are 2s complement, but I have recently come
across a C compiler, *still commerically avilable* for a sign magnitude
system.
> and this "negative surplus" is this "wicked 0x80", which can even
> lead to crazy nightmares for experienced programmers.
I'm not aware of any crazy nightmares for experienced programmers. Novice
ones, yes. But perhaps I was fortunate enough to learn assembly first and
that on every 2s complement system I've learned assembly for (and that's
pretty much all the systems I've ever come across) state:
NEG set overflow flag if input is $80 (or $8000 or $80000000
depending upon size of operand).
But in practice I've never had a real issue with this.
> In case of chars, this
> is only a 1% defect, but in case of a2, this is a 25% defect, which is
> really hard to explain to any user.
And a 50% defect in the case of "a1". But even there, on page 150 of my
copy of K&R C (the C Bible, and remember, Lua is based upon C):
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
This defines a variable called flags that contains three 1-bit
fields. The number following the colon represents the field width
in bits. The fields are declared unsigned int to ensure that they
are unsigned quantities.
> So let's concentrate on the positive world and on the unsigned bit numbers
> A, A2 nd A4.
>
> Labeling bits with large letter is of course a half nightmare again. So if
> you are flexible enough for this, I would use the sing "." to mark a bit.
> And if I have convinced you already enough concerning the importance of bits
> in such packings, you could please also allow the short cuts : and | for
> 2-bit and 4-bit unsigned.
>
> So then 3 more lines in your format list:
> . a bit (value 0/nil/false, 1/true)
> .[n] an (unsigned) bit number with n bits (value 0/nil/false, 1/true ...
> 2^n-1)
> : an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3)
> | an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3)
>
> If possible also the 2 following additonal float types:
> r short float
> D long double
>
> (remark: long double is an ansi C standard type - this in ANY case needs to
> be somehow in the list ... the 64bit fans otherwise will kill you...)
>
> As separators you have already allowed spaces, some people for sure want
> colons, I would propose also single quotes / hyphens '. So then the last
> line of format list should read:
> " ", ",", "'": These 3 charakters (space, colon, single quote) are ignored
>
> Then you could write nice formats to pack/unpack String into its bits like
> this (e. 1 Byte, 1short, 1 int):
> "....'...." or "||"
> "....'....'....'...." or "||'||"
> "||||'||||"
> (of course you should also allow "8." or "16." or "32." ... but the above
> notaions really look nice, even deigners would like this, I hope)
At this point, the Erlang bit syntax is looking better and better. An
example:
<<IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16,
ID:16, Flgs:3, FragOff:13,
TTL:8, Proto:8, HdrChksum:16,
SrcIP:32,
DestIP:32>>
It will even allow you to specify things like signedness, endianess:
X:6/little-signed-integer
(more about this here: http://erlang.org/doc/programming_examples/bit_syntax.html)
Put that into a string, and your packing/unpacking module could even
return a table with the fields prenamed and everything.
> A further VERY nice application would appear, if you would allow to specify
> the n in the format list. You use the parameter n already for lua_Number
> (SIDE REMARK1: which makes sense - just please specify how you do this - I
... [ snip ] ...
Or one could use a pre-existing module to serialize data. I wrote a very
extensive CBOR (Cocise Binary Object Representation, RFC-7049)
implementation for Lua:
https://github.com/spc476/CBOR
and I hear CBOR is very popular among the IoT crowd for its compactness of
representation. And it's not like it's hard to use.
> assume you need _tt and then the native byte number (so in LUA_32BITS this
> would be 4 byte for int or 4 byte for float)
Assuming the platform in question uses 4-byte ints and IEEE-754 floating
point. Again, you are making assumptions that Lua does not.
> - you have to specify how long
> the _tt is - I assume 1 byte is fine for this, and maybe zero for float and
> 1 for integer and 2 for boolean or s - this of course really MUST be
> specified exactly in the descirption of pack / unpack. You could e. g. also
> use _tt marking 0 for boolean with 1 byte, 4 for int32, 5 for float32, 8 for
> int64, 9 for double64, maybe also FE for pointer32 and FF for pointer64").
> ... so n is given away already... then maybe use #).
> (SIDE REMARK2: The specifier j and J is stupid - this makes no sense ... if
> somebody wants an integer in this list, please i should be used)
j and J give you at least a 64-bit int. i and I only give you the native
integer size, which can be 16-bit or larger.
> (SIDE REMARK3: In the format list for h and l you write "native size" - this
> is stupid in my eyes, please change to "2 bytes" for h, and "4 bytes" for l,
> or do you know some other native size for short and long??? - only
> lua_Integer and lua_Number has native size, as I see it)
I have actually used the "native size" for a project recently, to read
native binary integers written by a C program. You might not find a use for
the "native sizes" but that doesn't mean others won't.
> So for the extension I want to describe in the following, please two further
> line in your format list:
Have you thought of maybe implementing this yourself?
... [ snip ] ...
> (but please do not come with the argument, that these things bloat up the c
> code for string.pack / string.unpack very much - this I do not believe you
> ... these are just some minor additons in c code, which make these functions
> MUCH more flexible in use)
Which means this should be easy to implement, right?
-spc (Right?)