Why a new computer is slower than an old computer [video]

(youtube.com)

17 points | by tiernano 12 hours ago ago

7 comments

  • cogman10 9 hours ago ago

    David is both right and a little off (IMO).

    Where I think he's off is the assumption that dependencies are the primary reason for performance problems. It's true they are a part of that problem, but it's not the way he's thinking (IMO).

    What causes dependencies to be performance headaches is the fact that just like everything else, they aren't a priority when it comes to performance analysis. And the bigger the set of dependencies or the framework, the less likely those devs have spent meaningful amounts of time tuning for performance.

    These abstractions absolutely could actually be a shortcut to high performance, if there were good incentives to make it so.

    There are actually a few examples of this. For example, nothing you could do will be anywhere near as fast as the GMP when it comes to handling big numbers. Handling big numbers fast is the name of the game for that library.

    But those deps are the exception and not the rule. And that's the problem. It's not that dependencies couldn't help make everything faster. It's that like a regular application, dependencies are incentivized to bring in more dependencies to accomplish their goals which ultimately nukes performance.

    In his video, he talks about how the AI made a base64 encoded json stream. Of course that's horribly slow. But you know what isn't? Protobufs, flatmaps, or and a slew of binary serialization techniques and libraries. And those libraries will easily and efficiently do the work just as well, probably better, than what you could accomplish by handrolling a binary protocol.

    He also talks about the good old days (bad old days?) of hypertuning for caches, pages, and memory utilization. This is a fun practice to do, but in my experience performance problems very frequently aren't often due to any of these things. More often than not, it's the fact that someone is doing an n^2, n^3, or n! algorithm and nobody caught that. Generally speaking, if devs kept their software in O(1) or O(n) computational complexity, that'd solve a huge amount of the jank that modern systems experience. All that requires is collecting and reading profiling data to identify these hot spots in the first place.

    And that, IMO, is the primary source of failures. Devs are not typically collecting profiling metrics. They are more often than not just guessing at why something is slow and fixing what they think might be the problem. And if that problem lay in a dependency, well, often rather than fixing the dep devs will work around it.

    I've actually dealt with exactly this problem with, of all things, the microsoft JDBC driver.

    • hedora 7 hours ago ago

      I wonder how much adding a profiler to development flows would help modern apps.

      JS is gross, but 16ms (time you get to render a frame at 60 fps) is an eternity on modern systems.

      It’s tens of millions of single-threaded CPU cycles.

      Also, you probably can use GPU acceleration for client code. That’s enough time for a 2026 integrated CPU to do tens to hundreds of billions of tensor ops.

      And yet, the iOS keyboard (presumably multithreaded and native) cannot reliably echo keystrokes in under a second. I regularly see webpages take multiple seconds to redraw a screen.

      • Sohcahtoa82 6 hours ago ago

        I often think about DOOM running on a 66 Mhz 486.

        It ran at around 30 fps with a 320x200 screen. That's 64,000 pixels per frame, 1,920,000 pixels per second being rendered.

        On a 66 Mhz CPU, that means less than 35 clock cycles per pixel, on a CPU architecture where a multiply or add instruction would take multiple clock cycles to complete.

        I know DOOM was not a true 3D engine and it took a lot of shortcuts to look the way it did, but that makes it more amazing, not less. The amount of thought to go into it is just mind-boggling to me.

        • cogman10 6 hours ago ago

          > multiply or add instruction would take multiple clock cycles.

          Add, and, or, xor, and bit shift have always been single cycle operations for integers. Doom used integer math for everything I believe.

          • Sohcahtoa82 2 hours ago ago

            Ah, so you're right.

            Still though...care had to be taken to make sure memory was organized to maximize cache hits.

            I feel like the crazy optimizations necessary in those days have become a lost art to most game developers.

      • Archit3ch 7 hours ago ago

        > I wonder how much adding a profiler to development flows would help modern apps.

        Very much, but ideally you want telemetry on the user's device (assuming desktop app). Or your "optimization" might come back as a regression on the Snapdragons you didn't test on.

  • rationalist 10 hours ago ago

    [dead]