lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 10/15/2010 8:54 AM, David Manura wrote:
On Thu, Oct 14, 2010 at 9:50 AM, KHMan wrote:
On 10/14/2010 11:55 AM, David Manura wrote:
now requires two special cases: nbits == 0 and nbits == 32.
This is not for deflate, is it? There is no real reason to be requesting 0
bits or 32 bit in deflate, because the original pkzip architecture is
designed for 64KB segments. Huffman codes are limited in length as well,
etc. The encoding and decoding process has no real reason to be asking for 0
or 32 bits... ???

A good question.  ModuleCompressDeflateLua never extracts 32-bits at a
time from the bitstream during Deflate.  Rather, in parsing the gzip
header/footer, there are three occasions where it extracts 32-bit
integers from the bitstream:

   local mtime = bs:read(32) -- Modification TIME
   ...
   local crc32 = bs:read(32)
   local isize = bs:read(32) -- ignored

Granted, the gzip wrapping could be processed byte-wise instead, but
for consistency and given its triviality, it just reads from the same
bitstream as Deflate.

These are all well-aligned standard header fields. Reading them the usual way should be fine. The variable bit stream calls are for the compressed Huffman tree and the LZ encoding, and these are I think designed to be within 1-16 bit in length, per the original 64KB-segment design.

Just a little puzzled as to why a compression algorithm needs to call to get
0 bits -- which sounds like a no-op. Yet to see any algorithm that does it.
If it's not for deflate, then getting 32 bits can just skip everything and
use a faster call.

It reads a variable number (possibly zero) of "extra" bits:

   else -- val>  256 (i.e. extra bits)
     ...
     local nextrabits = tdecode_len_nextrabits[val]
     local extrabits = bs:read(nextrabits)
     local len = len_base + extrabits
     ...
     local dist_nextrabits = tdecode_dist_nextrabits[dist_val]
     local dist_extrabits = bs:read(dist_nextrabits, true)
     local dist = dist_base + dist_extrabits

Oh that, I see why you would want to optimize that way.

The encoded length codes has a linear distribution (meaning that with the extra offset bits, the actual distribution falls exponentially) or more often, exponential (actual distribution will be a very steep exponential curve). Same for distance codes. I have very detailed data for LZF, if you want to see some trends. So, deflate is designed so that many of encoded length values do not need any extra bits, or "more has fewer bits".

If we use an 'if' test, then the zero extra bits portion should be a bit faster and will run more often. This might just balance out having to execute all those extra lines if you were to use something that works with [0,32].

Furthermore, it skips over a variable number (possibly zero) of bits
to the next byte alignment (e.g. on uncompressed blocks or the gzip
trailer):

   if btype == BTYPE_NO_COMPRESSION then
     bs:read(bs:nbits_left_in_byte())
   ...
   bs:read(bs:nbits_left_in_byte())

Of course, these are minor cases, more of convenience than necessity,
and it would be easy enough to handle 0 and 32 specially.

Once per block, not much there. I think using the thing for [1,31] is fine, but that's just me. :-)

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia