Io_uring is not an event system (2021)

(despairlabs.com)

48 points | by signa11 5 days ago ago

22 comments

  • PunchyHamster 2 days ago ago

    >io_uring is not an event system at all. io_uring is actually a generic asynchronous syscall facility.

    It's not an event system at all! It's an event system!

    The type of events being subset of all possible events in system does not make it not events system. Nor it being essentially a queue

    • ciconia 2 days ago ago

      It's not an event system in the sense that it's not just a way to get notified when a file descriptor is ready for reading or writing. Like the OP says, it is a way to run syscalls asynchronously.

      • jcelerier 2 days ago ago

        > It's not an event system in the sense that it's not just a way to get notified when a file descriptor is ready for reading or writing.

        first time in my life I hear people calling this an event system. For me it's always been any architecture centered around asynchronous message queues.

        • hxtk 2 days ago ago

          Event-Driven Architecture refers to a different thing, but people used to refer to async code as event-based back before async/await existed, when doing async meant writing the Event Loop yourself.

      • Sharlin 2 days ago ago

        Yes, but "fd readiness checker" is a super narrow nonstandard definition of "event system". Though I get what the author tries to say.

  • kiitos 2 days ago ago

    "event system" is any mechanism where events (readiness, completion, signals, GUI messages) drive control flow, restricting "event system" to descriptor readiness exclusively is the author's personal framing, not exactly common parlance

  • lordnacho 2 days ago ago

    I thought the main selling point was that you could tell the system "when you finally get this response, run this function on it"

    • ciconia 2 days ago ago

      There are no callbacks in io_uring. You submit an operation (SQE), the kernel runs your request while you're doing other stuff, eventually you get back a completion entry (CQE) with the results of the operation.

      • Muromec 2 days ago ago

        Sounds like a callback to me

        • vlovich123 2 days ago ago

          Saying it’s a callback is equivalent to claiming select is a callback. Receiving an event does not a callback make - providing the function address to invoke would make it a callback.

        • pyrolistical 2 days ago ago

          There is no callback. The response just shows up on the other ring buffer.

          The client decides when to look at the ring buffer

        • Veserv 2 days ago ago

          Callback execution is: wait until begin event occurs, then do this operation.

          Asynchronous execution is: do this operation, then wait until finish event occurs.

          They are opposites.

        • Analemma_ a day ago ago

          It's not. The NT kernel and some others have genuine callbacks in some of their syscalls, where you pass a userspace function pointer which the kernel invokes on completion; io_uring isn't that and Linux doesn't have anything like that.

        • signa11 a day ago ago

          no it is not.

    • lstodd 2 days ago ago

      The kernel can't "run a function". It can only wake a thread sleeping on a syscall. This is called blocking IO.

      The whole point of async is to find a compromise between context switch cost and event buffering and the latency resulting from the latter. It is not about "running a function".

      • touisteur 2 days ago ago

        There is limited chaining capability in io_uring that can be an actual gamechanger if your chain of ops can be fully in-kernel.

        An intern of mine wrote a whole tcp-checkpoint-(pause-ingress-and-egress-and-the-socket-save-all-state-unpause-everything)-and-send-full-state-on-other-socket (which was a dozen or so ops - netlink writes, ioctls, set/getsockopt, read/write calls...) in a chain - all in one command-queue-write IIRC.

        Performance was as good as an ad-hoc kernel module, without any ebpf. We just had one kernel patch to handle some unhandled syscall (getsockopt ? Setsockopt ? Ioctl?) (that we sadly didn't upstream... 2 years ago) and we were done. Really a great system for batching syscalls.

        It made me wish for a form of DAG for error-handling or for parallel operations in chains...

      • lukeh 2 days ago ago

        io_uring can also use an eventfd to signal you need to check the completion queue. We use this with libdispatch to run a clang block on completion (the block is stored in the user_data). Admittedly this is a very specific use case.

      • themafia 2 days ago ago

        > The kernel can't "run a function".

        What is a signal handler?

        • wahern a day ago ago

          Yep. Signals were literally the original async model on Unix. They were a userspace abstraction over hardware interrupts, much as they are today, but the abstraction didn't turn out to be as fruitful as it might have been, perhaps because it was too thin. (Signaling queueing, i.e. real-time signals, meant to make signals more useful for application events, never went mainstream.) Back in the 1970s and 1980s the big arguments regarding async were about interrupt-driven (aka signals) and readiness-driven (aka polling), and relatedly edge-triggered vs level-triggered events. BSD added the select syscall along with the sockets API and that's when the readiness-driven, level-triggered model began to dominate. Though, before kqueue and then epoll came along there were some attempts at scaling async I/O using the interrupt-driven model--a signal delivered along with the associated descriptor. I think there's a vestige of this still in Linux, SIGIO.

          It's not always either/or, though. From the perspective of userspace APIs it's usually one or the other, but further down the stack one model might be implemented in terms of the other, sometimes with multiple transitions between the two, especially around software/hardware boundaries. Basically, it's turtles all the way down.

          Similarly, the debates regarding cancellation, stack management, etc, still persist; the fundamental dilemmas haven't changed.

        • lstodd a day ago ago

          It's still a context switch per event.

          Like other people here wrote nowadays one can push some processing into kernel context, but that sort of defeats the purpose of kernel/userland border. One can just write a kmod and be done with it then (and lose isolation).

        • ThrownOffGame a day ago ago

          After decades of gleefully using signal handlers to handle all sorts of contingencies, systems programmers were solemnly informed that signal handler functions were very dangerous indeed, because a bunch of other stuff was on the stack and undefined while they were being run, and therefore, handler functions couldn't call anything that was unsafe or non-reentrant.

          Systems programmers were told that the best signal handler function was one that set a flag, a volatile int, and then exited immediately without doing or touching anything else.

          Sort of defeats the purpose of the elaborate signal-handler-callback-pointer-to-function system we had in place, but them's the breaks.

  • catchcatchcatch 2 days ago ago

    [dead]