Re: string.pack with bit resolution

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: string.pack with bit resolution
From: bil til <flyer31@...>
Date: Fri, 18 Oct 2019 01:34:54 -0700 (MST)

Hi Sean, thanks for further info.

So you propose the letter q for the bits? Also fine for me. Helpful idea,
thank you.

Concerning your format strings "q", "q17", "q17q19q23": The byte size
according to my recommendation would be 1byte, 3byte, and 8byte. (Trailing
bits are filled with zeroes, to give bytes...).

The format string "1!<c4 I2 I2 I2 I2 I4 I4 s2&quot; is nice seen from my
side - with this I have no problem.

In LUA_32BIT system you could identically also use &quot;1!&lt;c4 I2 I2 I2
I2 J J s2&quot; for this. This is a VERY dangerous ambiguity, if you want to
exchange data with a LUA_64bit device... . Therefore please strictly FORBID
this &quot;J&quot; possibility. If you do programming on one machine, then
flexibility and syntactic sugar is always nice. But if you want to transfer
data between different machines, then the format specification MUST be 100%
water-tight strict, otherwise it is useless/ dangerous. 

Thank you for confirmation, that the transfer between LUA_32BIT and
LUA_64bit must be uniquely defined... if we agree on this, then I can sleep
well :). 

But I found one further ambiguous code number in the list, this is T (size_t
native size - also NOT clear in data exchange LUA_32 &lt;> LUA_64 ... so
seen from my side the codes j, J, T need killing in this table (or at least
they should have some (*) comment: "Do not use specifiers marked 'native'
for data exchange LUA_32 <> LUA_64". And the commment "(native size)" at h,
H, l, L, f, d please should be killed, as h, H, l, L, f, d have STRICTLY
defined size of 2/ 4 / 8 bytes. And the comment "(default is native
alignment)" for the ! parameter should clearly say "default is number
alignment of the following element". 

The "i" alone ("native size") without number I would also recommend to kill.
But as long as you have the i[n] possibility, I am fine with i.

The n I would extend to "native (number, boolean, string)", and as an update
to my post above, I would now recommend the following type byte specifier
(the marking 0B / 4B... in the table defines the number of follow-up bytes):
0x00  0B Boolean False
0x04  4B int32
0x08  8B int64 (warning: unpack in LUA_32 will cut to int32 and use
MIN_INT32/MAX_INT32 if necessary)
0x10  16B int128 (warning: unpack in LUA_32/64 will cut to int32/64)
0x40  0B Boolen True
0x44  4B float32 normalized
0x48  8B float64=double normalized (warning: unpack in LUA_32 will cut to
float32 and use -INF/+INF if necessary )
0x50  16B float128=long double normalized (warning: unpack in LUA_32/64 will
cut to float 32/64)
0x81...0xD4: nB String with 1...100 Bytes (n=1..100)
0xE4  (n+4)B String with n Bytes, preceded by 4byte signed length info
(valid range 1...2G)
0xE8  (n+8)B String with n Bytes, preceded by 8byte signed length info
(valid range 1...8GG)
0xF0...0xF8 0B Error "no_number" (0xF0+_tt info, so
0xF0=nil,0xF2=luserdata,0xF5=table,0xF6=function,0xF7=userdata,0xf8=thread)

To unpack function please allow a third parameter Error (which can be NIL,
but usually a string "Error", please allow also "Error%d"). If unpack meets
an invalid type specifier in the "string to unpack", then it should please
put this Error string in the return list, please best using
string.format(strError, Bytenumber) (so that if the Error string is
"Error%d", the byte number will be returned in the return list).

With this, then 'n' gets REALLY native for lua and extremely powerful, as
now it will not support only lua_Number, but also lua_Boolean and lua_String
in a very optimum way... . And this is what the users realy want, I am sure,
also the "wider fan group". 


I entered int128 and float128 in the list above, because I assume that in
not too far future there will come a lua128 which will support 128bit
numbers. long double is already supported in Ansi C, and Microsoft e. g.
speakes already about "single cycle" usage of long double for "CPUs with
SSE2" (google for "Floating pint support Microsoft").

But I am quite sure that the size_t length for pointer size will be kept at
64bit for VERY long time. Even storage devices with atomic 1nm resolution
would need an area of 1m² for 8GG bits, so for 8GG Bytes minimum 10m²
storage area will be required - I cannot imagine that this arrives in some
nearer future..., maybe in 2324 :), otherwise you can kill me, but you have
to speed up then, I will leave our nice earth in latest 60 years by default
:).

But therfore please also best insert the following 2 floating point
specifiers:
"r" short float/float16 (normalized, for "greedy byte" poeple like me)
"D" long double/float128 (normalized, warning: unpack on LUA_32 and LUA_64
will cut to float32/64 range)

For all floats I added the spec "normalized". But as every floating point
machine has a 1 cycle normalize command, I assume that any compiler anyway
normalizes floats automatically, even the usual FPU will typically do this
after every floating point operation.

Converting normalized floats between 16-32-64-128 bits is easy bit-bangling,
you can do this in about 5-20 C instructions... (you do NOT need float
support for this conversion, if the floats are normalized).

In the delimiter list, in my above post I proposed to use the hyphen/single
quote ' as additional delimiter to space. But this I redraw - this is
nonsense in lua... . But perhaps colon (and additionally also semi-colon
perhaps) as additional delimiters, it usually is very helpful to "optically
structure" such number formats.. .

My additional tags "#" (to repeat equal elements in some "dynamic way"), and
"^" (to specify number size in some "dynamic way") I would like to keep as
"WISH NUMBER 3" (Wish number 1 are bits, Wish number 2 is uniqueness and n
with type info, and #,^is wish number 3. ^you could also replace by _ or by
° if you prefer...).

So to summarize, here my propsed change to the pack/unpack format specifier
table:
n: a native lua element with type info start byte
(lua_Boolean/lua_Number/lua_String)
q[n]: a signed bit (Q[n] unsigned, n=1...16 fine for me)
r: short float/ float16 (normalized)
D: long double/ float128 (normaliezd, warning LUA_32/64 will unpack to float
32/64)
# post-char: specifies repeat count (no byte in data str, only for format
str),  then follow-up specifiers can be written #q or #i or #z or #n to use
this repeat count.
# pre-char: specifies repeat count dynamically (previous #post-char
required)
^ pre-char: specifies bit/byte length of numbers marked [n], then follow-up
specifiers can be written q^ or i^ ... 
^ post-char: specifies bit/byte length dynamically (previous #post-char
required)
' ' or ',' or ';' can be used as delimiters in format (no influence on
result)
'!(n)' sets alignment to n (defult is size of follow-up element)
For the format types "h,H,l,L,f,d please REMOVE the text '(native size)'"
Please add a footnote warning: "(X) Do not use 'native size' marked format
types for transfer LUA_32<>LUA_64 " and add this (X) to j, J, T ... (but I
would better recommend to kill j, J, T from the table).
for i[n], I[n]: Please add the warning: unpack for LUA_32/64 will restrict
to int32/64.
Concerning this alignment operator "!(n)" - I think this bloats up the C
code of pack / unpack unnecessarily (also the testing time for possible
errors / glitches...). I think usually anyone who uses pack/unpack, wants to
have alingment "!1"... . So I would also kill this "!(n)" specifier. So
pack/unpack will strictly use byte alignment, this is no restriction from my
point of view, but perfectly clear from what you expect for pack/unpack. (In
case of successive q[n]s of course please please please bit packing is
required... ).

Endianess please clearly can NOT be "native" by default, please use "little
endian" by default, Intel clearly has defeated Motorola :). But a
PRE-DEFINED endianess really should be specified clearly.. . 

And the string.unpack please needs this 2nd optional parameter errorstring
(defaults to nil).

And the last "future wish" would then be some table functions table.setn,
table.seti, table.setnknv, table.getnk, table.getk, table.getv, so that you
can write the following commands:
string.pack( "n# #n", #t, t)
t.setn, t.seti= string.unpack( "n# #n", errorstring)
string.pack( "n# #n #n", t.getnk(), t.getk(), t.getv())
t.setnknv( string.unpack( "n# #n #n"))

(please strictly take care, that the format string for pack and unpack can
be identical, also if you transfer data LUA32 <> LUA64).

This would be a miracle then ... .

... thank you for the patience, for all reading this excessively long post
...



--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Follow-Ups:
- Re: string.pack with bit resolution, bil til

References:
- string.pack with bit resolution, bil til
- Re: string.pack with bit resolution, Roberto Ierusalimschy
- Re: string.pack with bit resolution, bil til
- Re: string.pack with bit resolution, bil til
- Re: string.pack with bit resolution, Sean Conner
- Re: string.pack with bit resolution, bil til
- Re: string.pack with bit resolution, Sean Conner

Prev by Date: Re: C#/NLua + string.pack - Binary Message Not What I Expect
Next by Date: A linter in luac
Previous by thread: Re: string.pack with bit resolution
Next by thread: Re: string.pack with bit resolution
Index(es):
- Date
- Thread