The digits of pi are not random

(github.com)

10 points | by seccode 11 hours ago ago

22 comments

  • lifthrasiir 9 hours ago ago

    Z-score is not a direct measure of statistical significance, which necessarily needs a correct interpretation of z-score or similar numbers. If my measurement of normally distributed random variable resulted in the z-score of 5.0, it can still occur by a slim (<0.000058%) but still pure chance. If we don't know the variable's distribution in advance, the chance can be as high as 4% per Chebyshev's inequality. You don't have any necessary interpretation to derive your conclusion to exclude such possibilities.

    The only thing one can conclude from your claim is that, the first handful number of digits in pi can be very, very slightly compressed if you are somehow able to ignore the size of that classifier [1]. The algorithmic randomness however requires the classifier to be included, and there is no known self-contained program that is smaller than printing all digits in verbatim or computing them in the first place.

    [1] By the way, this is not really surprising at all. In fact there are 5,001 (not 5,000) out of first 10,000 decimal digits of pi that are between 5 and 9, so it can be very, very slightly compressed in that way---of course after the decompressor excluded.

    • seccode 9 hours ago ago

      Thanks for teaching me some important statistics!

  • tgv 9 hours ago ago

    One obvious problem is the model. There is a model that can predict each digit with 100% accuracy, so you must show in advance that your model will never have a bias, in all instances, not a single run. Even then, the tiniest error in the randomization procedure can have an effect. Perhaps you've only shown that your particular implementation of random forests is biased.

    The second problem is the interpretation of statistical significance. You don't even explain how you compute it, and it's not in the code. For starters, what's your df?

    Another problem is the sample. It might be possible that any range of 10,000 digits isn't random (according to your criterion), but the whole is. Now that I think about it: it is very likely there is a structure, since it's a rather limited string, and the number of possible bit strings of length n quickly outnumbers the total length of the string for increasing n. 10k bits can only contain the unique strings up to length 10, and those wouldn't be random at all (since you can predict the remaining bits with increasing accuracy). So the details of your prediction model are really essential.

    • 8 hours ago ago
      [deleted]
    • 9 hours ago ago
      [deleted]
      • 9 hours ago ago
        [deleted]
  • 8 hours ago ago
    [deleted]
  • eesmith 11 hours ago ago

    Next try 10 million digits instead of 10 thousand.

    The Z-score cannot be interpreted so easily. It must be evaluated over all the different ways people have tried to test the digits for randomness.

    "Since the advent of computers, a large number of digits of π have been available on which to perform statistical analysis. Yasumasa Kanada has performed detailed statistical analyses on the decimal digits of π, and found them consistent with normality; for example, the frequencies of the ten digits 0 to 9 were subjected to statistical significance tests, and no evidence of a pattern was found." - https://en.wikipedia.org/wiki/Pi

    • seccode 11 hours ago ago

      I don't think you're correct that the z-score is not being interpreted correctly. I haven't been able to find research demonstrating any non-randomness in pi, so without that we have nothing to compare it to.

      • eesmith 10 hours ago ago

        It's literally an xkcd, #882 https://xkcd.com/882/

        There's a large literature of people looking for non-randomness in pi, and failing.

        If you try 50,000 different tests, with a z-score of 3.29 then 0.1% - 50 of them - will give false positives.

    • seccode 11 hours ago ago

      I don't have the resources to try this unfortunately. I highly encourage somebody to try. But with more digits you may have to change the model.

      • eesmith 10 hours ago ago

        Yes, you do. You can download the digits at https://www.pilookup.com/download.html . You can generate models for different sized subsets. If it's really non-random, you should see the predictability stabilize as you get larger.

        Otherwise you risk being seen as yet another math crank.

        • seccode 10 hours ago ago

          The issue is not with getting the digits, the issue is with running a large model for larger digit ranges. I tried running with 10,000,000 digits and haven't gotten a prediction yet.

        • seccode 10 hours ago ago

          Also, I am testing different ranges of digits other than first 10,000, but the problem with other ranges is that the distribution of digits is highly imbalanced and the model is not showing statistical significance, but models have a harder time when the distribution of classes is not 50/50, so I think its not quite fair to evaluate the model on these ranges.

          So why do you think the first 10,000 digits are somewhat predictable?

          • lifthrasiir 9 hours ago ago

            Because "somewhat predictable" doesn't mean "non-random". In fact, almost all prefixes of algorithmically random bit sequences are somewhat predictable with an appropriate definition of "somewhat", because you can find and exploit an accidental bias from taking the prefix and any such bias translates to some predictability.

            (Another possibility is that, since pi itself is not really algorithmically random, the classifier was somehow able to learn how to partially compute pi! That's another pitfall you need to avoid even when you have a good understanding of information theory and statistics...)

          • eesmith 6 hours ago ago

            The distribution of digits is 'highly imbalanced' because that's what random distributions look like. I'll randomly select the digits 0-9 for 10,000 times and show the distribution, then do the same with the first 10,000 digits of pi, then do the random distribution again:

              >>> import random
              >>> from collections import Counter
              >>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
              >>> for digit, count in ctr.most_common():
              ...   print(f"{digit}: {count}")
              ...
              2: 1039
              4: 1035
              0: 1031
              7: 1022
              3: 1008
              6: 998
              1: 976
              5: 973
              9: 963
              8: 955
              >>> pi_ctr = Counter(open("1-10000.txt").read().rstrip())
              >>> for digit, count in pi_ctr.most_common():
              ...   print(f"{digit}: {count}")
              ...
              5: 1046
              1: 1026
              2: 1021
              6: 1021
              9: 1014
              4: 1012
              3: 974
              7: 970
              0: 968
              8: 948
              >>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
              >>> for digit, count in ctr.most_common(): print(f"{digit}: {count}")
              ...
              8: 1060
              2: 1048
              0: 1034
              4: 1026
              5: 1025
              3: 979
              7: 977
              6: 960
              1: 956
              9: 935
            
            You can see that the distribution of pi's first 10,000 digits is what one should expect for a random distribution. If your method requires a 50/50 distribution then it cannot be used for this purpose.

            Also, you are thinking about it wrong. The first 10,000 digits of pi are perfectly predictable.

  • 9 hours ago ago
    [deleted]
  • 9 hours ago ago
    [deleted]
  • 9 hours ago ago
    [deleted]
  • seccode 11 hours ago ago

    Worth noting that I showed _statistical significance_, not _proof_

    • pentaphobe 25 minutes ago ago

      Not really - you showed an _observation_ about the distribution of decimal digits in a very limited sample of a long (likely infinite) set, but then confidently presented a conclusion about the whole set drawn from this.

      If you'll forgive the reductio ad absurdem: I could toss a coin once and post a misleading write up about how coin tosses land on heads 100% of the time. :)

      Also notable: the description of your repo (at time of writing) is literally "Statistically significant proof that the digits of pi are not random". The word "proof" there does seem to contradict what you're saying in this message.

      Not bullying by the way - I'm well familiar with how exciting it can be to find something cool. (And kudos for sharing)

      But if you use terms like "statistically significant proof" then you're making claims / misleading statements (or just being clickbaity) rather than providing objective observations and _separare_ subjective hypetheses.

      (IMO) a far more palatable alternative would be "I tried analysing the first N digits of pi with a statistical algorithm they're not evenly distributed." (ie. Just the facts, no claims) or if you want to retain the clickbait, "I used statistics to analyse the first N digits of pi.. you won't believe the 9,000th digit!"

      As others have mentioned here: of course it's not random. We can predict the next digit with 100% accuracy. But even meeting you halfway and choosing to interpret your use of "random" as meaning "evenly distributed" it's a big reach