lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Sean Conner, that algorithm worked great, thanks for the enlightenment! I've incorporated it in the library in pure lua yet and now printing big integers is substantially faster.

Now I just wonder if I could have a more efficient unsigned integer division.
Currently I am doing basically this algorithm https://courses.cs.vt.edu/~cs1104/BuildingBlocks/divide.030.html
However it does not take much the advantage of the fact that integers are composed of multiple 32bit words. I need to shift a single bit many times in that algorithm. For example when dealing with 8192bit integers would mean that much shifts for a single division. I wonder if there is a better (but reasonably simple) algorithm that could take advantage of the multiple words.

Em sex., 10 de jul. de 2020 às 20:52, Sean Conner <sean@conman.org> escreveu:
It was thus said that the Great Eduardo Bart once stated:
> Counting number of bits in what context? The library doesn't have this
> function, I do shift in one bit in integer pow functions but it's not much
> with the intention of counting bits.
>
> What I would like to optimize more is the conversion from base 2 to base 10
> in the library, it is used when printing big numbers in base 10 as string.
> My `tobase` function calls too many times `udivmod` (unsigned integer
> division with modulus) for really large integers (8192 bits or so) and this
> make tostring very slow for big numbers. If you have any ideas how I could
> optimize that base conversion (or perhaps udivmod) I would appreciate it.

  You want to as large of a division as possible.  On a 32-bit system, you
want to do a "64 bits divided by 32 bits" repeatedly, and on a 64-bit
system, you'll want to do a "128 bits divided by 64 bits" repeatedly [1].
For best results, you would need to hit assembly, but a reasonable C
compiler will probably produce decent results with code like:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <assert.h>

#define MAX 256

unsigned long xremainder(
        unsigned long       dest[MAX],
        unsigned long const src[MAX],
        unsigned long       demon
)
{
  unsigned long long carry = 0;
  unsigned long long quo;
  unsigned long long rem;
  size_t             i;

  for (i = 0 ; i < MAX ; i++)
  {
    carry <<= (sizeof(long) * CHAR_BIT);
    carry  |= src[i];
    quo     = carry / demon;
    rem     = carry % demon;
    dest[i] = quo;
    carry   = rem;
  }

  return rem;
}

int main(void)
{
  unsigned long  src[MAX];
  unsigned long  dest[MAX];
  unsigned long  zero[MAX];
  char           buffer[BUFSIZ];
  char          *p;

  assert(sizeof(unsigned long long) > sizeof(unsigned long));

  memset(zero,  0,sizeof(zero));
  memset(src, 255,sizeof(src)); /* maximum number */

  p    = &buffer[BUFSIZ];
  *--p = '\0';

  do
  {
    unsigned long rem = xremainder(dest,src,10);
    assert(rem <= 9);
    *--p     = rem + '0';
    memcpy(src,dest,sizeof(src));
  } while(memcmp(src,zero,sizeof(zero)));

  puts(p);
  return 0;
}

  If you can somehow do "larger" than "bit-at-a-time" division, that should
speed up the code.

  -spc

[1]     All the architectures I'm familiar with that have a DIV instruction
        will allow one to divide a number twice the "word size" (32 bits on
        a 16 bit machine, 64 bits on a 32 bit machine, 128 bits on a 64 bit
        machine) by the word size.

        Conversely, the MUL instruction will usually result in a result
        twice the normal word size (even the 8-bit 6809, which has an 8-bit
        MUL instruction will produce a 16-bit result).