Re: question on the patch FastString

Your are bad counting!

"I mean multiplication by 0b10101* x = 21*x = 16*x + 4*x + x, = (x << 4) + (x << 2) + (x << 0) , 2 additions"

OK, but leading high order-bits set to 0 don't count (Also I fix the incorrect hexadecimal prefix). This is one solution for the 5-bit problem: it is optimal.

But:

"0b011011 * x = 31*x + 3*x = (32*x - 16*x) + (4*x - x) = (x<<5) - (x<<4) + (x<<2) - (x<<0) , 1 addition and 2 subtractions"

That's wrong, it is really:

"0b11011 * x = 0b11111 * x - 0b00100 * x = 31*x - 4*x = 32*x - 4*x - 1 : 2 substractions"

This is another solution for the 5-bit problem, it's not //better// that the first one if you want to maximize the number additions/substraction (with two operands)

---

But:

"31*x+3*x" is part of the 6-bit problem (the multiplicator is 34 = 0b100010 which is obviously not optimal, and not odd so not a prime and not a solution to that problem)

For the 6-bit problem a solution with the prime multiplier 127 = 0b111111 is also not optimal (only 1 substraction)

Retry with 0b010101 = 21, you find an prime which is a solution and optimal (3 additions)

In general for the problem of 2N-bit multiplication, the fastest computing solution using the minimum number of additions, requires **at most** N additions (or substractions), independantly of the fact that the multiplier is prime or not (but it may need less). The **worst** occurs when there's a maximum number of sequences alternating 1's and 0's in the 2N-bit multiplier (ignoring all low-order bits that are 0's, so the multiplier is necessarily odd), for which the number of additions/substractions cannot be reduced: as this is out desired goal for a hash we should seek such 2N-bit odd multiplier, with the additional goal that this must be a prime (for Knuth's compression algorithm): you just have to count the number of alternating sequences of 1 and 0, and want that this count is maximum, so you want to minimize the number of "long" sequences to consider that this is optimal.

First trying 0b010101, you get an optimal solution for the 6-bit problem instantly as you find a prime. This solution is optimal for the 6-bit problem as well as for the 5-bit problem; but this does not mean that there's a single optimal solution: other 6-bit odd numbers with the same number of alternating sequences of 1s and 0s could also be primes as well for the 6-bit problem.

You certainly want that sequences of 1s and 0s cannot be factorized, so want that there should not exist common subsequences (longer than 1 bit) whose highest and lowest bits are set to 1; otherwise any such repetition of the pattern of bits (whih contains at least 2 additions or substration) is factorizable, where the subsequence canbe computed once and the rest of the patterns can replace their 2 additions by just one (the result computed by the first occurence of the common pattern).

So when I gave the constant 0xAAAB (which as a maximum number of alternances for a 16-bit number), it was not optimal (sorry) because there were 4 repetitons of the common three bits 0b101 which allow their factorisation (0b101 required 2 additions for the first occurence, but the 3 others would just require 1), and it can be factored again because there's two repetitions of the pattern 0b1010101 grouping the two first occurences of 0b101 so the last 2 repeated occurences of 3bits become just 1 repeated occurence of 6bits. And the multiplication by 0xAAAB can then be computed with 5 additions (4 for the pattern 0b1010101_0_1010101, plus 1 for the lowest order bit set to 1). You may effectively find more optimal results than with 0xAAAB by avoiding repeated patterns (but you must still check that candidates without repeated patterns of bits, that are not factorizable are primes; several primes will be found, you need to evaluate how many repeated patterns of the same bits /01(.*)0(.*)1/ in regexp notation they contain; just counting the alternances is not enough, it's just a way to enumerate candidates before checking they are primes and then searching for repeated subpatterns of distinct bits.

In your proposed solution 0xB6DB, there's a common pattern with the common hex digit 0xB.

Avoiding all repetitions of common subpatterns of bits is not reachable for all word sizes: repetitions of supatterns will necessarily occur (jsut try representing the constant multiplier with quaternary digits: there's at most 4 distinct digits, one of them, 0, has no interest for our goal as we're interested only in sequences containing at least one bit set to 1. And just with 3 distinct quaternary digits, you can't avoid repeated patterns in multipliers larger than 12 bits. If your multiplier has 13 bits, you need at least one repeated pattern, with 16bits you can tolerate three other repeated patterns, under the condition that the 4 repetitions of 2-bit patterns are not themselves creating 2 repetitions of 4-bit patterns.

So you have to measure the effect of repeteated patterns on the reduction of the number of additions/substraction to compute the minimum number of additions/substraction needed (this is not used in the Egyptian multiplication algo, and in hardware multipliers by arbitrary pairs of integers, which just looks at successive bits without searching: only pairs of successive bits are considered to detect those where there's a transition from 0 to 1 or 1 to 0. For a specific constant multiplier however you can perform this search to reduce further the number of hardware stages (and so to reduce the size of the chip with no additional cost and no additional gain of performance, it is still 1 cycle per multiplication, but just a reduction of cost for building the device:

It's an interesting goal in cryptanalysis because we want to maximize also the cost of construction so that the device becomes unfeasible using limited resources; if you have unlimited resources, you can build the device that will compute the multiplication of arbitrarily large numbers in a single cycle: if the operands have 2N bits, you know that you just need N parallel adders with 2N bits each, the cost of construction is only 2N² for gates to build in the device, it becomes rapidly very costly if N is huge; for operands with size 2N=64, you need 32 adders of 64-bit each, so 32*64 gates: this is still not a lot, 2048 gates fit easily on a modest chip and in any ALU or FPU of any modern CPU/GPU/APU chip and can be used without optimizing it further for specific constant multipliers; however in an FPU where you use series with known constant to compute a sinus or exponential or locarithm, reducing the size of the multiplier is always beneficial as it also saves energy and reduces the cost of construction, or size of the chip, or exposition to failures; and you can then integrate other mathematical functions in your FPU for the same price and the same energy spent at runtime, or you can use less espensive technologies not requiring higher densities for integrating more gates: todays processors have a few billions gates, all foundries are trying to reduce the number of gates to reduce the energy and allow building more chips at lower cost, so they spend considerable time to study the arithmetic properties in their ALU, FPU and other logical arrays, and reduce the number of needed stages to get higher throughput and increase the possible frequencies at which the chips can run without returning unstable results with random errors).

With the classic multipliation of integers, the goals are not very difficult. The problems are much moire complex for other operations, like multiplications of polynoms, or finding if an integer has a prime divisor and determining that divisor (this is the RSA problem, used in cryptography). Other dificult problems are using elliptic curve also with the goal of maximizing the cost of construction of the device that would solve the problem instantly (single cycle) or in a short time (limited number of cycles, given the current clock speeds at which devices can operate and their cost of construction/acquisition and cost of operation i.e. the energy to spend running the device to get the result).

So if you find other interesting 16-bit primes (for hashing 32-bit integers) with better patterns (with less repetitions), and a method for counting these repetititions so we can accurately determine the minimum number of additions/substraction to perfomr a multiplication by this prime, it wouldbe good to know (I've read that locating repeated patterns of bits in a larget bitset was a complex NP-problem which can become very slow for large bitsets, but 32-bit and 64-bit integers are not very large and there should exist a way to counting these repetitions exhaustivly, even by brute-force, and get a proven measurement): this would allow selecting the best primes to use for Knuth's hashes. May be the constant given by Knuth himself is in fact optimal, may be there are other primes that are equally optimal for the same goal.

Now we'll want to get some optimal 32-bit mutliplier for hashing 64-bit integers (and may be some optimal 8-bit prime for hashing 16-bit integers, but it should not be difficult, we instantly eliminate all even numbers and know that all Mersenne numbers are non optimal so we've eliminated 1,3,5,8,9,15,17,31,33,63,65,127,129,255 including some non-primes).