Continuous batching (2025)

(huggingface.co)

34 points | by jxmorris12 a day ago ago

7 comments

  • umairnadeem123 a day ago ago

    good writeup but im curious about tail latency under mixed prompts. if one request has huge context and another is tiny, do you bucket by expected decode length or just fifo with continuous refill?

    also did you test fairness knobs? ive seen p95 improve while a few tenants get starved unless there is some aging policy.

  • charcircuit a day ago ago

    This article does not explain what happens if the multiple prompts need different experts. Does it try and schedule the maximum number experts into memory to try and run the maximum number of prompts at once? Scheduling gets very complicated and there are different trade offs around fairness of processing which prompts at which times.

  • hunterpayne a day ago ago

    Continuous batching is a nonsense term. Systems that behave this way are usually called streaming (as opposed to batch).

    Batch is started and stopped and produces one result (or set of results). Computation happens when the job is started until the job completes.

    Streaming is on continuously and computation happens when new data arrives. It produces a stream of results.

    This is a streaming system. And most of the analysis doesn't really understand how high throughput (as in >16Mb/s/core) are architected. It doesn't even use the correct language for what is being done to the data (relational algebra) so the proper optimization techniques can't be applied.

    It is however a very nice explanation of the preprocessing used in LLM systems.

  • asteroidburger a day ago ago

    How long until “first principles” is a meme like “considered harmful”? Or are we there already?

    • dang 9 hours ago ago

      Ok, we've removed first principles from the title above.

    • wavemode a day ago ago

      "from first principles" has been a common phrase in science and philosophy for a long time: https://en.wikipedia.org/wiki/First_principle

      • esseph a day ago ago

        Sure, but that's not the way it's being used by your daily twitter/X poster.