Calculating the Fibonacci numbers on GPU

(veitner.bearblog.dev)

14 points | by rbanffy 4 days ago ago

7 comments

  • lb-guilherme 3 hours ago ago

    The algorithm here, fast doubling, is sequential (there is no parallelism) and is quite fast, with less than a hundred operations to arrive to the final result. This runs in nanosecond on a CPU and the benchmark in the article is mostly measuring the communication time rather than actual computation time.

    • Someone 2 hours ago ago

      They also do not make use of the fact that matrix multiplication is associative, so M⁴ = M² × M², M⁸ = M⁴ × M⁴, etc. That means one can compute F32 using 5 matrix multiplications. This uses 31.

      It wouldn’t surprise me at all if doing those 5 matrix multiplications on a CPU were faster than doing 31 on the GPU.

      Also, FTA: “To calculate extremly large fibonacci numbers we need to avoid integer overflow. A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication“

      ⇒ This also is NOT calculating the Fibonacci numbers, copping out when things get difficult on a GPU.

      If you want to compute Fn mod some constant for 1 ≤ n ≤ some large constant fast on a GPU, I think I would compute quite a few of the intermediate matrices using the fast way, then spin up some ‘threads’ to compute the intermediate versions by copying those with the ‘standard’ matrix.

      • KeplerBoy an hour ago ago

        The matrix in question is tiny (2x2). There's no hope a GPU would ever outperform a CPU on that.

        It only gets interesting if you need a large matmuls or millions of small matmuls in parallel.

  • eesmith 2 hours ago ago

    > Using the power of GPUs we are able to calculate F99999999 in only 17 milliseconds on my Consumer GPU

    FWIW, on my 2020 era laptop, the following:

      #include <stdio.h>
      
      int main(void) {
        int a = 1, b = 1, c, n = 99999999;
        while (--n) {
          c = (a+b) % 9837;
          a = b;
          b = c;
        }
        printf("%d\n", a);
        return 0;
      }
    
    takes 330 ms measured as "time a.out".

    The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:

      import functools
    
      @functools.cache
      def F(n):
          if n <= 2:
              return 1
    
          m = n // 2 # rounds down
          if n % 2:
              # odd
              return (F(m+1)**2 + F(m)**2) % 9837
          else:
              return ((2*F(m+1) - F(m))*F(m)) % 9837
    
      print(F(99999999))
    • amelius 25 minutes ago ago

      Maybe a better test is digits of Pi?

      Or recursive hashing?

      • staunton 13 minutes ago ago

        A test of what?

        • amelius 7 minutes ago ago

          Test of your GPU approach/library versus CPU.