Re: [ANN] lua-bint - Arbitrary precision integer arithmetic library in pure Lua

Sean Conner, that algorithm worked great, thanks for the enlightenment! I've incorporated it in the library in pure lua yet and now printing big integers is substantially faster.

Now I just wonder if I could have a more efficient unsigned integer division.

Currently I am doing basically this algorithm https://courses.cs.vt.edu/~cs1104/BuildingBlocks/divide.030.html

However it does not take much the advantage of the fact that integers are composed of multiple 32bit words. I need to shift a single bit many times in that algorithm. For example when dealing with 8192bit integers would mean that much shifts for a single division. I wonder if there is a better (but reasonably simple) algorithm that could take advantage of the multiple words.

Em sex., 10 de jul. de 2020 às 20:52, Sean Conner <sean@conman.org> escreveu:

It was thus said that the Great Eduardo Bart once stated:
> Counting number of bits in what context? The library doesn't have this
> function, I do shift in one bit in integer pow functions but it's not much
> with the intention of counting bits.
>
> What I would like to optimize more is the conversion from base 2 to base 10
> in the library, it is used when printing big numbers in base 10 as string.
> My `tobase` function calls too many times `udivmod` (unsigned integer
> division with modulus) for really large integers (8192 bits or so) and this
> make tostring very slow for big numbers. If you have any ideas how I could
> optimize that base conversion (or perhaps udivmod) I would appreciate it.

You want to as large of a division as possible. On a 32-bit system, you
want to do a "64 bits divided by 32 bits" repeatedly, and on a 64-bit
system, you'll want to do a "128 bits divided by 64 bits" repeatedly [1].
For best results, you would need to hit assembly, but a reasonable C
compiler will probably produce decent results with code like:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <assert.h>

#define MAX 256

unsigned long xremainder(
unsigned long dest[MAX],
unsigned long const src[MAX],
unsigned long demon
)
{
unsigned long long carry = 0;
unsigned long long quo;
unsigned long long rem;
size_t i;

for (i = 0 ; i < MAX ; i++)
{
carry <<= (sizeof(long) * CHAR_BIT);
carry |= src[i];
quo = carry / demon;
rem = carry % demon;
dest[i] = quo;
carry = rem;
}

return rem;
}

int main(void)
{
unsigned long src[MAX];
unsigned long dest[MAX];
unsigned long zero[MAX];
char buffer[BUFSIZ];
char *p;

assert(sizeof(unsigned long long) > sizeof(unsigned long));

memset(zero, 0,sizeof(zero));
memset(src, 255,sizeof(src)); /* maximum number */

p = &buffer[BUFSIZ];
*--p = '\0';

do
{
unsigned long rem = xremainder(dest,src,10);
assert(rem <= 9);
*--p = rem + '0';
memcpy(src,dest,sizeof(src));
} while(memcmp(src,zero,sizeof(zero)));

puts(p);
return 0;
}

If you can somehow do "larger" than "bit-at-a-time" division, that should
speed up the code.

-spc

[1] All the architectures I'm familiar with that have a DIV instruction
will allow one to divide a number twice the "word size" (32 bits on
a 16 bit machine, 64 bits on a 32 bit machine, 128 bits on a 64 bit
machine) by the word size.

Conversely, the MUL instruction will usually result in a result
twice the normal word size (even the 8-bit 6809, which has an 8-bit
MUL instruction will produce a 16-bit result).