The Cost of Indirection in Rust

(blog.sebastiansastre.co)

99 points | by sebastianconcpt 4 days ago ago

53 comments

  • cwillu 21 hours ago ago

    > Maintainability and understandability only show up when you’re deliberate about them. Extracting meaning into well-named functions is how you practice that. Code aesthetics are a feature and they affect team and agentic coding performance, just not the kind you measure in the runtime.

    > And be warned: some will resist this and surrender to the convenience of their current mental context, betting they’ll “remember” how they did it. Time will make that bet age badly. It’s 2026 — other AI agents are already in execution loops, disciplined to code better than that.”

    Hard disagree: separating code from its context is exactly how you end up in the situation of needing to “remember”. Yes, helper functions and such can be useful for readability, but it's easy to overdo it and end up with incomprehensible ravioli code that does nothing terribly complicated in a terribly complicated manner.

    • tabwidth 16 hours ago ago

      The worst version of this I've seen is when every layer is like four lines long. You step into a function expecting some logic and it's just calling another function with slightly different args. Do that six times and you forgot what the original call was even trying to do. Naming helps in theory but in practice half those intermediate functions end up with names like processInner or handleCore because there's nothing meaningful to call them.

      • syockit 2 hours ago ago

        It'd be great if IDEs can store a stack of functions currently being explored similar to what you get when debugging. Not breadcrumbs, but plain stack. Bonus points if you can store multiple stacks, and give them names according to the context.

      • iterance 13 hours ago ago

        Ah. I once worked in a team with a hard cyclomatic complexity cap of 4 per function. Logic exceeding the cap needed to be broken into helper functions. Many, many functions were created to hold exactly one if statement each. Well, the code was relatively high quality for other reasons, but I can't say this policy contributed much.

      • api 15 hours ago ago

        Any pattern executed robotically like this becomes a self parody.

    • chowells 19 hours ago ago

      I think I agree with what you're getting at, though I usually phrase it differently: indirection is not abstraction. A good abstraction makes it easier to understand what the code is doing by letting you focus on the important details and ignore the noise. It does this by giving you tools that match your problem space, whatever it may be. This will necessarily involve some amount of indirection when you switch semantic levels, but that's very different from constantly being told "look over there" when you're trying to figure out what the code is saying.

      • scottlamb 19 hours ago ago

        Agree, and I would add that a bad abstraction, the wrong abstraction for the problem, and/or an abstraction misused is far worse than no abstraction. That was bugging me in another thread earlier today: <https://news.ycombinator.com/item?id=47350533>

    • bch 11 hours ago ago

      New pasta paradigm unlocked. Sounds like a case of premature optimization, leaning too hard on DRY.

      Am reminded also of a discussion of software engineering between John Ousterhout (of whom I'm a big fan) and Robert Martin[0][1].

      [0] https://github.com/johnousterhout/aposd-vs-clean-code

      [1] https://youtu.be/3Vlk6hCWBw0

      • cwillu an hour ago ago

        I believe ravioli predates lasagna as a code pasta.

    • OtomotO 19 hours ago ago

      Someone has worked too much on corporate Java Codebases.

      I feel your pain. Everything is so convoluted that 7 layers down you ask yourself why you didn't learn anything useful...

      • cwillu 19 hours ago ago

        Last time was a go shop, and let me tell you: that style mixes with go's error handling like spoiled milk and blended shit.

        Oh gee, thank you for this wrapped error result, let me try to solve a logic puzzle to see (a) where the hell it actually came from, and (b) how the hell we got there.

      • pjmlp 6 hours ago ago

        Corporate has the magic touch to do that to any programming language.

    • schubart 21 hours ago ago

      I’m familiar with spaghetti code and with lasagna code (too many layers) but I’m curious: what’s ravioli code?

      • p1necone 20 hours ago ago

        Each part of the codebase is a separate self contained module with its own wrapping (boilerplate), except there's like 30 of them and you still have to understand everything as a whole to understand the behaviour of the system anyway.

      • tartoran 21 hours ago ago

        Think of what ravioli are and apply that to the same code analogy as spagetti or lassagna. The code is split in tiny units and that creates too much indirection, a different indirection than spagetti or ravioli. The architecture feels fragmented even though there's nothing wrong with each piece.

      • fsckboy 20 hours ago ago

        a ravioli is a b̶l̶a̶c̶k̶ beige box abstraction to which you pasta arguments interface usually after forking

      • paulddraper 12 hours ago ago

        It's "spaghetti" code, but with encapsulation. [1]

        Lots and lots of little components, but not in a way that actually makes anything easier to actually find.

        [1] https://wiki.c2.com/?RavioliCode

  • bombela a day ago ago

    I think this long post is saying that if you are afraid that moving code behind a function call will slow it down, you can look at the machine code and run a benchmark to convince yourself that it is fine?

    • layer8 a day ago ago

      I think it’s making a case that normally you shouldn’t even bother benchmarking it, unless you know that it’s in a critical hot path.

      • eptcyka 20 hours ago ago

        I must add that code is on the hot path only under two conditions:

        - the application is profiled well enough to prove that some piece of code is on the hot path

        - the developers are not doing a great job

    • antonvs 19 hours ago ago

      This long post is demonstrating that Knuth’s advice, “premature optimization is the root of all evil,” is still one of the first heuristics you should apply.

      The article describes a couple of straw men and even claims that they’re right in principle:

      > Then someone on the team raises an eyebrow. “Isn’t that an extra function call? Indirection has a cost.” Another member quickly nods.

      > They’re not wrong in principle.

      But they are wrong in principle. There’s no excuse for this sort of misinformation. Anyone perpetuating it, including the blog author, clearly has no computer science education and shouldn’t be listened to, and should probably be sent to a reeducation camp somewhere to learn the basics of their profession.

      Perhaps they don’t understand what a compiler does, I don’t know, but whatever it is, they need to be broken down and rebuilt from the ground up.

  • ekidd a day ago ago

    We have been able to automatically inline functions for a few decades now. You can even override inlining decisions manually, though that's usually a bad idea unless you're carefully profiling.

    Also, it's pointer indirection in data structures that kills you, because uncached memory is brutally slow. Function calls to functions in the cache are normally a much smaller concern except for tiny functions in very hot loops.

    • scottlamb 20 hours ago ago

      I'm not sure Rust's `async fn` desugaring (which involves a data structure for the state machine) is inlineable. (To be precise: maybe the desugared function can be inlined, but LLVM isn't allowed to change the data structure, so there may be extra setup costs, duplicate `Waker`s, etc.) It's probably true that there is a performance cost. But I agree with the article's point that it's generally insignificant.

      For non-async fns, the article already made this point:

      > In release mode, with optimizations enabled, the compiler will often inline small extracted functions automatically. The two versions — inline and extracted — can produce identical assembly.

      • ekidd 19 hours ago ago

        I am fairly doubtful that it makes sense to be using async function calls (or waits) inside of a hot loop in Rust. Pretty much anything you'd do with async in Rust is too expensive to be done in a genuinely hot loop where function call overhead would actually matter.

  • hutao 18 hours ago ago

    One of the unwritten takeaways of this post is that async/await is a leaky abstraction. It's supposed to allow you to write non-blocking I/O as if it were blocking I/O, and make asynchronous code resemble synchronous code. However, the cost model is different because async/await compiles down to a state machine instead of a simple call and return. The programmer needs to understand this implementation detail instead of pretending that async functions work the same way as sync functions. According to Joel Sposky, all non-trivial abstractions are leaky, and async/await is no different. [0]

    The article mixes together two distinct points in a rather muddled way. The first is a standard "premature optimization is the root of all evil" message, reminding us to profile the code before optimizing. The second is a reminder that async functions compile down to a state machine, so the optimization reasoning for sync functions don't apply.

    [0] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

    • fpoling 10 hours ago ago

      One non-trivial problem with async in Rust is that it leads to code that allocates on one CPU and free memory on another. That kills a lot of optimizations that system allocators try to do with CPU local caching and harms performance badly especially on fat servers with a lot of CPUs. When one hits this problem, there is no easy solution.

      Ideally using an allocator per request would solve this issue, but Rust has no real support for it.

      A workaround that works is to stop using async and just use a native thread per request. But then most crates and frameworks these days use async. So indeed async abstraction us very leaky regarding the cost.

    • dehrmann 14 hours ago ago

      > The programmer needs to understand this implementation detail instead of pretending that async functions work the same way as sync functions.

      Async is telling the OS "I'll do it myself" to threading and context switches.

      • throwaway17_17 10 hours ago ago

        I agree that from the perspective of the implementation of async code, it is in many ways an application doing its own threading and context switching. However, your Parent comment is written from the perspective of the dev writing and reasoning about the code. In that case, from the devs perspective, async is there to make concurrent code ‘look like’ (since it certainly is not actually) sequential code.

        I think this type of confusion (or more likely people talking past one another in most cases) is a fairly common problem in discussing programming languages and specific implementations of concepts in a language. In this case the perceived purpose of an abstraction based on a particular “view point”, leads to awkward discussions about those abstractions, their usefulness, and their semantics. I don’t know if there is way to fix these sorts of things (even when someone is just reading a comment thread), but maybe pointing it out can serve to highlight when it happens.

    • stevenhuang 8 hours ago ago

      Yeah the author makes a really poor example with the async case here.

      Async in rust is done via cooperative scheduling. If you call await you enter a potential suspension point. You're willingly telling the scheduler you're done running and giving another task a chance to run. Compound that with something like tokio's work stealing and now you'll possibly have your task migrated to run on a different thread.

      If this is in hot path making another call to await is probably the worst thing you can do lol.

      The author demonstrates later with a dead simple inlining example that the asm is equivalent. Wonder why he didn't try that with await ;)

  • wowczarek 14 hours ago ago

    Regardless of the language, optimisation of this kind has always been a trap for me when moving back and forth between old or otherwise small, embedded systems and modern hardware and toolchains. When we were learning C, compilers weren't as smart as they are today, and every little bit helped - old habits die hard. The lesson is simple - just see what the compiler does with your code first. But also, weigh the real performance pinch points vs. readability and convenience, and as much as it's tempting, don't optimise prematurely - of course I always do; its fun.

  • Sytten a day ago ago

    Also to note that the inline directive is optional and the compiler can decide to ignore it (even if you put always if I remember)

  • cat-whisperer a day ago ago

    I wouldn't have agreed with you a year ago. async traits that were built with boxes had real implications on the memory. But, by design the async abstraction that rust provides is pretty good!

  • a day ago ago
    [deleted]
  • thezipcreator 20 hours ago ago

    seems pointless to extract `handle_suspend` here. There are very few reasons to extract code that isn't duplicated in more than one place; it's probably harder to read to extract the handling of the event than to handle it inline.

    • skrtskrt 20 hours ago ago

      I strongly prefer this sort of code:

      ```

          fn does_a_many_step_process():
      
               first_step_result_which_is_not_tied_to_details_internal_to_the_step_implementation = well_named_first_step_which_encapsulates_concerns();
      
      
      
              second_step_result_in_same_manner = well_named_second_step_which_encapsulates_concerns();
      
        ...etc
      } ```

      The logic of process flow is essentially one kind of information. All the implementation details are another. Step functions should not hide further important steps - they should only hide hairy implementation details that other steps don't need to know about.

    • kstrauser 20 hours ago ago

      One huge one is so that you can test it in isolation.

    • scuff3d 20 hours ago ago

      There's extraction for reuse and then theres extraction for readability/maintainability. The second largely comes down to personal taste. I personally tend to lose the signal in the noise, so it's easy for me to follow the logic if some of the larger bits are pushed into appropriately named functions. Goes to the whole self commenting code thing. I know there's a chunk of code behind that function call, I know it does some work based on its name and args, but I don't have to worry about it in the moment. There's a limit of course, moving a couple lines of code out without good cause is infuriating.

      Other people prefer to have big blocks of code together in one place, and that's fine too. It just personally makes it harder for me to track stuff.

  • Scubabear68 a day ago ago

    A function call is not necessarily an indirection. Basic premise of the blog is wrong on its face.

    • hrmtst93837 20 hours ago ago

      People new to Rust sometimes assume every abstraction is free but that's just not the case, especially with lifetimes and dynamic dispatch. Even a small function call can hide allocations or vtable lookups that add up quickly if you're not watching closely.

      • simonask 16 hours ago ago

        Why do you mention lifetimes here? They are exclusively a compile-time pointer annotation, they have no runtime behavior, thus no overhead.

        Dynamic dispatch in general is much, much faster than many people’s intuition seems to indicate. Your function doesn’t have to be going much at all for the difference to become irrelevant. Where it matters is for inlining.

        Dynamic dispatch in Rust is expected to be very slightly faster than in C++ (due to one fewer indirections, because Rust uses fat pointers instead of an object prefix).

    • alilleybrinker a day ago ago

      Did you read the article? The author makes exactly that point.

      • a day ago ago
        [deleted]
    • paulddraper 12 hours ago ago

      1. "Indirection" can be logical, or runtime.

      2. Please read the blog. That's literally what is said.

  • foo4u 13 hours ago ago

    I love how this post, almost to a fault, just jumps right in. No BS set up. Not even context set up. Just what you expected after reading the title. That's an art.

    As for the context of the article, maintainability is almost always worth the cost of the function lookup. The proof here that the cost is almost non-existent means to me the maintainability is always worth the perceived (few cycles) impact unless this is real-time code.

  • armchairhacker 21 hours ago ago

    A nitpick I have with this specific example: would `handle_suspend` be called by any other code? If not, does it really improve readability and maintainability to extract it?

    • rudolph9 20 hours ago ago

      The idea is that performance isn’t a reason not to do it. Other considerations may cause you to choose inline, but performance shouldn’t be one of them.

    • elzbardico 19 hours ago ago

      re-use as a criteria for functional decomposability is a very misguided notion

  • stevenhuang 7 hours ago ago

    The author is right about inlining but has picked the wrong example to show this since the compiler cannot inline across await.

    If this function is in the hot path the last thing you'll want to do is to needlessly call await. You'll enter a suspension point and your task can get migrated to another thread. It is in no way comparable to the dead simple inlining example given later.

    This is why you should always benchmark before making guesses, and to double check you're even benchmarking the right thing. In this case they used the findings from a nonasync benchmark and applied it to async. This will lead you to a very wrong conclusion, and performance issues.

  • slopinthebag a day ago ago

    Cool article but I got turned off by the obvious AI-isms which, because of my limited experience with Rust, has me wondering how true any of the article actually is.

    • ramon156 a day ago ago

      I don't see anything wrong code-wise, but it's definitely an odd way of making an accumulator. Maybe I'm pedantic

    • paulddraper 12 hours ago ago

      Doesn't seem like AI.

      This is an incomplete sentence:

      > All cases where you are in a CPU intensive blocking task that, if you’re not careful, could starve all the others.