• Subject: Re: [ANN] lua-bint - Arbitrary precision integer arithmetic library in pure Lua
• From: Philippe Verdy <verdyp@...>
• Date: Sat, 25 Jul 2020 07:59:51 +0200

Peecomputing scaled reciprocal is unfeasible for large words. Your lookup table for example with 32-bit words would have 2^32 entries (excluding the reciprocal of 0). That lookup would be enormous.
However for small tables this is feasible so you can build a small table of 256 reciprocals, and then handle the division by group of 4 bits (i.e. work in base 16 and make the traditional division digit by digit in that base, given that you'll handle two digits at once for the product). If you can use a larger lookup table (e.g. 1024 entries, i.e. 10 bits, you'll handle the divisions in base 32; with 4096 entries, the division will be made in base 64; above this limit the lookup table is already too large: 16K entries would turn into a too large module).
But there are other ways to perform divisions than the classic Euclidian method.

Le sam. 11 juil. 2020 à 10:22, Andrew Gierth <andrew@tao11.riddles.org.uk> a écrit :
>>>>> "Eduardo" == Eduardo Bart <edub4rt@gmail.com> writes:

Eduardo> What I would like to optimize more is the conversion from base
Eduardo> 2 to base 10 in the library, it is used when printing big
Eduardo> numbers in base 10 as string. My `tobase` function calls too
Eduardo> many times `udivmod` (unsigned integer division with modulus)
Eduardo> for really large integers (8192 bits or so) and this make
Eduardo> tostring very slow for big numbers. If you have any ideas how
Eduardo> I could optimize that base conversion (or perhaps udivmod) I
Eduardo> would appreciate it.

There are two standard tricks for this (and you should use both).

First, don't repeatedly divide by the base. Instead, precompute powers
of the base (I'll assume 10 for example purposes): 10^2, 10^4, 10^8,
10^16, etc., up to a value that's close to (but less than) half of the
input bits. That way, the remainder after dividing by that largest size
will always fit in half the bits, and no more than three (often only
two) such full-length divisions are needed to break the input into
convenient chunks, which you then divide into halves, etc.

Second, don't use division at all (except possibly for that first step,
if you don't want to have to deal with doing an N*N->2N bit multiply, or
a "high half" multiply which will suffice). Instead, do division by
way of multiplication by scaled reciprocal (which you can precompute).

--
Andrew.

• Follow-Ups:
• References: