Cores That Don't Count [pdf]

(sigops.org)

99 points | by signa11 20 hours ago ago

12 comments

  • mofosyne 16 hours ago ago

    This is about unstable cores that randomly output incorrect calculation and ways to mitigate it via better hardware testing and duplicating parts of the core that can fail often.

    I did however thought initially from the title that it's about 1-bit CPUs like the MC14500B Industrial Control Unit (ICU) which is a CMOS one-bit microprocessor designed by Motorola for simple control applications in 1977. It completely lacks an ALU so essentially cannot count, but is designed for PLCs.

    • winwang 38 minutes ago ago

      Hey. It could count to 1, which is something.

  • freeqaz 16 hours ago ago

    Unrelated to the topic being discussed, but my mind immediately went to "per core pricing" which is common for databases. Some SQL servers would be charged for by the number of CPU cores in a system, and manufacturers would often offer an SKU with fewer, faster cores to compensate for this.

    Taking that thought and thinking about adding "silent" cores is interesting to me. What if your CPU core is actually backed by multiple cores instead to get the "fastest" speed possible? For example imagine if you had say 2 CPU cores that appeared as one and each core would guess the opposite branch of the other (branch prediction) so that it was "right" more of the time.

    An interesting thought that had never occurred to me. It's horribly inefficient but for constrained cases where peak performance is all that matters, I wonder if this style of thought would help. ("Competitive Code Execution"?)

    • buildbot 16 hours ago ago

      People have thought about it, but it’s so incredibly wasteful that it’s impractical. At 20% branching, you rapidly run out of resources pending the winning branch and spend possibly 8 cores just to predict three branches ahead, or roughly 15 instructions. That’s pretty rough!

      • eep_social an hour ago ago

        In distributed computing, a few layers of abstraction up, an analogous technique of sending two identical RPCs to distinct backends can be used to reduce tail latency.

      • hinkley 13 hours ago ago

        I wonder if you could put more logic units per core and load balance to prevent thermal throttling, or if you’d make the communication pathways slower at a rate that exceeds the gains.

        • buildbot 13 hours ago ago

          Yep, you can do that, and yep, it gets slower.

          That’s basically the tradeoff Apple made with their M series chips vs AMD/Intel which until recently have been chasing fast and narrow designs. Apple in contrast, has a crazy “wide” core aka it can issue and retire many more instructions per clock than basically any other mainstream CPU.

    • userbinator 10 hours ago ago

      For example imagine if you had say 2 CPU cores that appeared as one and each core would guess the opposite branch of the other (branch prediction) so that it was "right" more of the time.

      I belive some CPUs do speculate down both paths of branches if the branch predictor was really uncertain which one to take.

    • jcul 11 hours ago ago

      Not exactly the same thing, but I remember talking with a co-worker before about strategies to use a core and a hyperthreaded sibling core on the same work load, to get speed up.

      However, in practice I think it would be really difficult to prevent them just trashing each others cache / using resources.

      • lallysingh 7 hours ago ago

        Yeah your options are to spin on a few lines of cache (e.g. an iterated function or processing a ring buffer) or streaming cache ops

  • bla3 18 hours ago ago

    [2021]

  • cryptonector 15 hours ago ago