Cosmologically Unique IDs

(jasonfantl.com)

249 points | by jfantl 5 hours ago ago

69 comments

  • stonegray 8 minutes ago ago

    Specifying a CSPRNG as an entropy source to avoid collision is incorrect.

    CSPRNGs make prediction of the next number difficult (cracking-AES difficulty) but do not add entropy and must be seeded uniquely otherwise they will output the same numbers. Unless the author is proposing having the same machine generate a single universe-scale list in one run.

    Also “banning” ids that are all 1s or 0s is silly; they are just as valid and unique as any other number if you’re generating them properly. Although I might suggest purchasing a lottery ticket if you get an UUID with all settable bits as 1.

  • lisper 4 hours ago ago

    This analysis is not quite fair. It takes into account locality (i.e. the speed of light) when designing UUID schemes but not when computing the odds of a collision. Collisions only matter if the colliding UUIDs actually come into causal contact with each other after being generated. So just as you have to take locality into account when designing UUID trees, you also have to take it into account when computing the odds of an actual local collision. So a naive application of the birthday paradox is not applicable because that ignores locality. So an actual fair calculation of the required size of a random UUID is going to be a lot smaller than the ~800 bits the article comes up with. I haven't done the math, but I'd be surprised if the actual answer is more than 256 bits.

    (Gotta say here that I love HN. It's one of the very few places where a comment that geeky and pedantic can nonetheless be on point. :-)

    • k_roy 7 minutes ago ago

      Reminds me of a time many years ago when I received a whole case of Intel NICs all with the same MAC address.

      It was an interesting couple of days before we figured it out.

    • u1hcw9nx 3 hours ago ago

      You must consider both time and locality.

      From now until protons decay and matter does not exist anymore is only 10^56 nanoseconds.

      • Sharlin 3 hours ago ago

        If protons decay. There isn't really any reason to believe they're not stable.

        • hnuser123456 3 hours ago ago

          And recent DESI data suggests that dark energy is not constant and the universe will experience a big crunch in a little more than double its current age, for a total lifespan of 33 billion years, no need to get wild with the orders of magnitude on years into the future. The infinite expansion to heat death over 10^100 years is looking less likely, 10^11 years should be plenty.

          https://www.sciencedaily.com/releases/2026/02/260215225537.h...

        • frikit 3 hours ago ago

          Protons can decay because the distinction between matter and energy isn't permanent.

          Two quarks inside the proton interact via a massive messenger particle. This exchange flips their identity, turning the proton into a positron and a neutral pion. The pion then immediately converts into gamma rays.

          Proton decayed!

      • Etheryte 3 hours ago ago

        That's such an odd way to use units. Why would you do 10^56 * 10^-9 seconds?

        • magicalhippo 33 minutes ago ago

          Nanoseconds is a natural unit for processors operating around a GHz, as it's roughly the time of a clock cycle.

          If a CPU takes 4 cycles to generate a UUID and the CPU runs at 4 GHz it churns out one every nanosecond.

        • lisper 3 hours ago ago

          This was my thought. Nanoseconds are an eternity. You want to be using Planck units for your worst-case analysis.

          • u1hcw9nx 3 hours ago ago

            If you go far beyond nanoseconds, energy becomes a limiting factor. You can only achieve ultra-fast processing if you dedicate vast amounts of matter to heat dissipation and energy generation. Think on a galactic scale: you cannot have even have molecular reaction speeds occurring at femtosecond or attosecond speeds constantly and everywhere without overheating everything.

            • lisper 3 hours ago ago

              Maybe. It's not clear whether these are fundamental limits or merely technological ones. Reversible (i.e. infinitely efficient) computing is theoretically possible.

      • rbanffy 3 hours ago ago

        If we think of the many worlds interpretation, how many universes will we be making every time we assign a CCUID to something?

        • petcat 2 hours ago ago

          > many worlds interpretation

          These are only namespaces. Many worlds can have all the same (many) random numbers and they will never conflict with each other!

        • shiandow 2 hours ago ago

          In that interpretation the total number of worlds does not change.

        • antonvs 3 hours ago ago

          We don't "make" universes in the MWI. The universal wavefunction evolves to include all reachable quantum states. It's deterministic, because it encompasses all allowed possibilities.

          • rbanffy 3 hours ago ago

            Humpf…

            You just had to collapse my wave function here…

      • dheera 2 hours ago ago

        Protons (and mass and energy) could also potentially be created. If this happens, the heat death could be avoided.

        Conservation of mass and energy is an empirical observation, there is no theoretical basis for it. We just don't know any process we can implement that violates it, but that doesn't mean it doesn't exist.

        • dinosaurdynasty 31 minutes ago ago

          Conservation laws result from continuous symmetries in the laws of physics, as proven by Noether's theorem.

      • scotty79 3 hours ago ago

        Proton decay is hypothetical.

        • hamdingers 2 hours ago ago

          So is the need for cosmologically unique IDs. We're having fun.

      • rubyn00bie 3 hours ago ago

        I got a big laugh at the “only” part of that. I do have a sincere question about that number though, isn’t time relative? How would we know that number to be true or consistent? My incredibly naive assumption would be that with less matter time moves faster sort of accelerating; so, as matter “evaporates” the process accelerates and converges on that number (or close it)?

        • zamadatix 3 hours ago ago

          Times for things like "age of the universe" are usually given as "cosmic time" for this reason. If it's about a specific object (e.g. "how long until a day on Earth lasts 25 hours") it's usually given in "proper time" for that object. Other observers/reference frames may perceive time differently, but in the normal relativistic sense rather than a "it all needs to wind itself back up to be equal in the end" sense.

        • idiotsecant 3 hours ago ago

          The local reference frame (which is what matters for proton decay) doesn't see an outside world moving slower or faster depending on how much mass is around it to any significant degree until you start adding a lot of mass very close around.

    • svnt 3 hours ago ago

      Maybe the definitions are shifting, but in my experience “on point” is typically an endorsement in the area of “really/precisely good” — so I think what you mean is “on topic” or similar.

      Pedantry ftw.

    • RobotToaster 2 hours ago ago

      Would this take into account IDs generated by objects moving at relativistic speeds? It would be a right pain to travel for a year to another planet, arrive 10,000 years late, and have a bunch of id collisions.

      • 9dev an hour ago ago

        Oh no! We should immediately commence work on a new UUID version that addresses this use case.

      • lisper 2 hours ago ago

        I have to confess I have not actually done the math.

    • ctoth an hour ago ago

      Hanson's Grabby Aliens actually fits really well here if you're looking for some math to base off of.

  • vessenes 18 minutes ago ago

    Chiming in from the decentralized world - there’s an adversarial / cooperative dynamic in the assignment of these IDs - and the selection of parents, not discussed in the original. I think you could possibly get to sub linear by allowing a small number of cooperative nodes to assign new IDs.

    On the contrary, having the right to assign IDs is powerful; on balance, to my mind the right thing to do is some sort of a ZK verifiable random function, e.g. sunspot-based transformations combined with some proof of ‘fair’ random choice. In that case, I think the 800 bit number seems like plenty. You could also do some sort of epoch-based variable length, where for the next billion years or so, we use 1/256 of the ID space, (forced first bit to 0), and so on.

  • m4nu3l 3 hours ago ago

    A more realistic estimate of the total number of addressable things should take into account that for anything to be addressable, its address should be stored somewhere at least once.

    If it takes at least Npb particles to store one bit of information, then the number of addressable things would decrease with the number of bits of the address.

    So let's call Nthg the number of addressable things, and assume the average number of bits per address grows with Nb = f(Ntng).

    Then the maximum number of addressable things is the number that satisfies Nthg = Np/(Npb*f(Ntng)), where Np is the total number of particles.

  • j-pb 4 hours ago ago

    Great insights and visualisations!

    I build a whole database around the idea of using the smallest plausible random identifiers, because that seems to be the only "golden disk" we have for universal communication, except for maybe some convergence property of latent spaces with large enough embodied foundation models.

    It's weird that they are really under appreciated in the scientific data management and library science community, and many issues that require large organisations at the moment could just have been better identifiers.

    To me the ship of Theseus question is about extrinsic (random / named) identifiers vs. intrinsic (hash / embedding) identifiers.

    https://triblespace.github.io/triblespace-rs/deep-dive/ident...

    https://triblespace.github.io/triblespace-rs/deep-dive/tribl...

    • ctoth an hour ago ago

      Entity identity can be intrinsic. Why not consistency contracts?

  • adityaathalye 4 hours ago ago

    Just past page 281 of Becky Chambers's delightful "the galaxy, and the ground within".

      Received Message
      Encryption: 0
      From: GC Transit Authority --- Gora System (path: 487-45411-479-4)
      To: Ooli Oht Ouloo (path: 5787-598-66)
      Subject: URGENT UPDATE
    
    Man I love the series.

    Looks like this multispecies universe has centrally-agreed-upon path addressing system.

    • pavel_lishin 3 hours ago ago

      You should check out Vernor Vinge's A Fire Upon The Deep for more fun examples of how intra-galactic communication would be labeled, with routes & such.

    • Octoth0rpe 4 hours ago ago

      From this book in particular, I love the scene with everyone sitting around talking about how horrifying the concept of cheese is. The rest of the quartet is wonderful, with the second book (A Closed and Common Orbit) being the MVP IMO.

  • ekipan 3 hours ago ago

    I forget the context but the other day I also learned about Snowflake IDs [1] that are apparently used by Twitter, Discord, Instagram, and Mastodon.

    Timestamp + random seems like it could be a good tradeoff to reduce the ID sizes and still get reasonable characteristics, I'm surprised the article didn't explore there (but then again "timestamps" are a lot more nebulous at universal scale I suppose). Just spitballing here but I wonder if it would be worthwhile to reclaim ten bits of the Snowflake timestamp and use the low 32 bits for a random number. Four billion IDs for each second.

    There's a Tom Scott video [2] that describes Youtube video IDs as 11-digit base-64 random numbers, but I don't see any official documentation about that. At the end he says how many IDs are available but I don't think he considers collisions via the birthday paradox.

    [1]: https://en.wikipedia.org/wiki/Snowflake_ID

    [2]: https://youtu.be/gocwRvLhDf8

    • bricss 33 minutes ago ago
    • swiftcoder 3 hours ago ago

      > [1]: https://en.wikipedia.org/wiki/Snowflake_ID

      Isn't this just the same scheme as version 1 UUID, except with half the bits? I guess they didn't want to dedicate 128 bits to their IDs.

    • drchickensalad 3 hours ago ago

      That also looks like the widely used BSON ids, to anyone else interested

    • buzzerbetrayed 3 hours ago ago

      Getting the entire universe to agree on a single clock for creating timestamps sounds absurdly difficult. Probably impossible?

      • ekipan 2 hours ago ago

        "Agreement" of time is probably nonsense, yeah. I realized after posting so I edited in the parenthetical, but as [3] notes, locality probably makes this less of a real issue.

        Apparently with the birthday paradox 32 bit random IDs only allow some tens of thousands per second before collision chance passes 50%. Maybe that's acceptable?

        [3]: https://news.ycombinator.com/item?id=47065241

      • speakeron 36 minutes ago ago

        The temperature of the cosmic microwave background can be used as a universal clock.

  • rini17 3 hours ago ago

    From real life we know that people prefer to have multiple anonymous IDs, or self-selected handles, either makes fully deterministic generation schemes moot.

    Also, network routing requires objects that have multiple addresses.

    Physics side of whole thing is funny too, afaik quantum particles require fungibility, i.e. by doxxing atoms you unavoidably change the behavior of the system.

    • pavel_lishin 2 hours ago ago

      > From real life we know that people prefer to have multiple anonymous IDs

      There's nothing stopping a entity from requesting multiple IDs from one of the "devices"!

  • bluecoconut 4 hours ago ago

    Fun read.

    One upside of the deterministic schemes is they include provenance/lineage. Can literally "trace up" the path the history back to the original ID giver.

    Kinda has me curious about how much information is required to represent any arbitrary provenance tree/graph on a network of N-nodes/objects (entirely via the self-described ID)?

    (thinking in the comment: I guess if worst case linear chain, and you assume that the information of the full provenance should be accessible by the id, that scales as O(N x id_size), so its quite bad. But, assuming "best case" (that any node is expected to be log(N) steps from root, depth of log(N)) feels like global_id_size = log(N) x local_id_size is roughly the optimal limit? so effectively the size of the global_id grows as log(N)^2? Would that mean: from the 399 bit number, with lineage, would be a lower limit for a global_id_size be like (400 bit)^2 ~= 20 kB (because of carrying the ordered-local-id provenance information, and not relative to local shared knowledge)

    • AlotOfReading 3 hours ago ago

      Two ways to frame it:

      Provenance is a DAG, so you get a partial order for free by topological sort. That can be extended to a compatible total order. Then provenance for a node is just its ordering. This kind of mapping from objects to the first N consecutive naturals is also a minimal perfect hash function, which have n log n overhead. We can't navigate the tree to track ancestry, but equality implies identical ancestry.

      Alternatively, we could track the whole history in somewhat more bits with a succinct encoding, 2N if it's a binary tree.

      In practice, deterministic IDs usually accept a 2^-N collision risk to get log n.

    • montyanne 3 hours ago ago

      The ATProto underlying BlueSky social network is similar. It uses a content-addressed DAG.

      Each “post” has a CID, which is a cryptographic hash of the data. To “prove” ownership of the post, there’s a witness hash that is sent that can be proved all the way up the tree to the repo root hash, which is signed with the root key.

      Neat way of having data say “here’s the data, and if you care to verify it, here’s an MST”.

  • ktpsns 4 hours ago ago

    Quite offtopic, but: I found UUIDs being overused in many cases. People then abused them to store data, making them effectively "speaking IDs" or "multi column indices".

    • jmole 4 hours ago ago

      Unless it's a key that needs to be sortable (e.g. insertion order) or a metric/descriptor of some kind, I'm not sure why UUID would be overused or inappropriate for use.

  • manofmanysmiles 4 hours ago ago

    I'd propose using our current view of physical reality to own a subset of the UIID + version field if new physics is discovered.

    10-20 bits: version/epoch

    10-20 bits: cosmic region

    40 bits: galaxy ID

    40 bits: stellar/planetary address

    64 bits: local timestamp

    This avoids the potentially pathological long chain of provenance, and also encodes coordinates into it.

    Every billion years or so it probably makes sense to re-partion.

    • skvmb 9 minutes ago ago

      offset length

        00     04:    Version + Flags
        04     08:    Timestamp (uint64)
        12     16:    Node/Agent Hash
        28     16:    Namespace Hash
        44     32:    Random Entropy
        76     20:    Extra / Extension
        96     32:    Integrity Hash
      
      Total: 128bytes
    • rbanffy 3 hours ago ago

      As for coordinates, don’t forget galaxies are clouds of stars flowing around and interacting with each other.

      • dylan604 3 hours ago ago

        That's the problem with address type of systems is that they expect the object at that location to always be at that location. How do you encode the orbital speed, radius of orbit for not just the object, but also the object it is orbiting will need the same info as it is also in motion, then that object's parent galaxy's motion. Ugh, now I need a nap to calm down a bit.

        • rbanffy 3 hours ago ago

          You could estimate when the object was labelled by the coordinates used.

          But where is the Greenwich meridian for the Milky Way?

  • QuiCasseRien 2 hours ago ago

    I really love everything related to Cosmology but I always struggle with two contrary concepts that lead to paradox (for me) :

    - Infinity : from school, we learn our universe is infinite.

    - We often do calculation with upper limit like this one : 10^240. This is a big number butttttt it's not infinite you know. 10^240+1, 10^240+2...

    So :

    1. if it's infinite, why doing upper limit calculation ?

    2. if it's limited, what is there outside that limit ?

    Extremly paradoxal

  • small_model 3 hours ago ago

    We will probably end up with something like each planet has its own local addressing, and the big router in the sky does NAT, each solar system has a router and so on.

  • alex_tech92 4 hours ago ago

    It is interesting how much of our infrastructure relies on the assumption that 'close enough' is actually 'good enough' for uniqueness. When we move from UUIDs to things like ULIDs or Snowflake IDs, we are really just trading off coordination cost for a slightly higher collision risk that we will likely never hit in several lifetimes. Thinking about it on a 'cosmological' scale makes you realize how much of a luxury local generation is without needing a central authority. It is that tiny bit of entropy that keeps the whole distributed system from grinding to a halt.

    • fsckboy 5 minutes ago ago

      >the assumption that 'close enough' is actually 'good enough' for uniqueness

      i'm pretty sure it's "far enough" that makes it "good enough"

  • factotvm 4 hours ago ago

    > In order to fix this, we might start sending out satellites in every direction

    Minor correction: Satellites don't go in every direction; they orbit. Probes or spaceships are more appropriate terms.

    • fluoridation 3 hours ago ago

      Maybe they meant at every inclination. ;)

  • philipwhiuk 2 hours ago ago

    Note that they almost immediately contract from 'the universe' to 'the visible universe', which isn't the same thing at all.

    • mr_mitm 8 minutes ago ago

      It's observable universe, and that's the only thing that matters. Events outside the observable universe are causally disconnected. We will never interact with anything outside the observable universe. For all practical purposes, it's the same thing.

  • eudamoniac 32 minutes ago ago

    I was going to read this, but it starts with an AI slop header image for no purpose, so I intuited that the article was similarly ill constructed.

  • dvh 2 hours ago ago

    Another blow to the "all electrons are the same electron" theory. Why have only 1 electron with so many possible ids /s

  • frikit 3 hours ago ago

    The best way to solve this is not to, and just giving up on the idea of identification.

    If you have an infinite multiverse of infinite universes, and perhaps layers on that, with different physics, etc., you can’t have identity outside of all existence.

    In Judaism, one/the name of God is translated as “I am”. I believe this is because God’s existence is all, transcending whatever concepts you have of existence or of IDs. That ID is the only ID.

    So, the cosmic solution to IDs is the name of God.