Varlink – IPC to replace D-Bus gradually in systemd

(varlink.org)

41 points | by synergy20 7 hours ago ago

46 comments

  • newpavlov 3 hours ago ago

    So Varlink requires a proper specification of message types, but then uses god damn JSON to serialize messages? Why would you do that?

    Apparently, we don't have enough wasted cycles in modern software, just add a bunch more in a fundamental IPC infrastructure.

    • jchw 2 hours ago ago

      Unfortunately I concur.

      There are a lot of problems with D-Bus, but that it wasn't using JSON wasn't one of them.

      The tooling for D-Bus is terrible. It has composite types but they are very primitive, e.g. you can't name fields in a structure without extensions (there's like three different extensions for this.) A lot of the code generators can't actually handle real-world schemas.

      Now I understand that D-Bus itself is also a problem (as in, the protocol design beyond serialization) but varlink looks like a very solid step backwards. In any single individual use case JSON serialization is unlikely to be a huge bottleneck, but in practice if this will be used for all kinds of things it will add up.

      I really wish the Linux desktop would coalesce around something like Cap'n'proto for serialization.

      P.S.: It'd probably be possible to shoehorn in multiple formats but I think that's a mistake too. Something like capnp will have a strong impedance mismatch with JSON; mixing formats with different degrees of self-describability is unwise. But then formats like MsgPack and BSON which are direct JSON analogues miss out on the best benefits of using binary serialization formats, which makes them almost the worst of both worlds, and you'd probably be better off with using a highly-optimized JSON library like simdjson.

      • arianvanp 2 hours ago ago

        CBOR is the obvious choice here.

        I've suggested it before and the systemd folks didn't seem completely opposed to it. Also because cbor parser is already in the dependencies due to FIDO2 disk unlocking

        • theamk an hour ago ago

          What is the data rate (messages, bytes) that you expect for this "D-Bus replacement protocol"?

          Which fraction of CPU will JSON ser/des will take to justify using CBOR?

    • jsnell 2 hours ago ago

      I've done similar things in the past (json over udp-to-localhost, with a schema description used to generate the code for both parsing and generating on each end). It's a totally reasonable point in the design space, and I literally never saw a reason to revisit the decision to use json for that system.

      You'd do that because everything under the sun knows how to generate and parse json. Performance looks like a total non-issue for this use case. The messages are tiny, local and infrequent. You wouldn't want to do this on some kind of a high performance application on the data plane, but on the control plane it's totally fine.

      Even if you expect that all production use cases will use some kind of higher level library that does schema validation, it can still be quite nice during debugging to a) be able to inspect the messages on the wire when debugging, b) to be able to hand-write and inject messages.

    • deivid 2 hours ago ago

      There was a Varlink talk[0] a few days ago at All Systems Go, in that talk, Lennart mentioned that JSON is unfortunate (primarily due to no 64 bit ints) but it has the surprising benefit of being able to understand the bus messages when using `strace` for debugging.

      [0]: https://www.youtube.com/watch?v=emaVH2oWs2Y&list=PLWYdJViL9E...

      • anotherhue 7 minutes ago ago

        With this logic (oft repeated) we should be sending TCP as JSON. { sourcePort: 443, destinationPort: 12345,...}

        Debugability is important but the answer is to build debugging tools, not to lobotomise the protocol for the vanishingly tiny fraction of packets that are ultimately subject to debugging.

      • gmokki 2 hours ago ago

        I do not understand where the 64bit integers not working with JSON comes from.

        JSON the format had no limit on integer size. And all Java JSON libraries I know can handle arbitrary prevsion integers (BigInt) and 32/64bit int/long types when serializing and deserializing.

        Quick googling shows that also JavaScript has proper support for 64bit integers with their BigInt type, and it can be used to deserialize incoming data correctly with the default parser, albeit requiring a bit more work to annotate the fields.

        Personally I often explicitly make sure that the integers I return in trust environments as identifiers in REST APIs are by default longer than 52bits so that buggy parser libraries are caught early.

        • capitainenemo an hour ago ago

          The number type in JSON is 64 bit float, limiting integers without loss of precision to 2⁵³-1.

          BigInt is a new concept and not technically supported. So whether it works in your random library of choice is probably a potshoot. "Use within JSON: Using JSON.stringify() with any BigInt value will raise a TypeError, as BigInt values aren't serialized in JSON by default. " https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

        • GrayShade 2 hours ago ago

          Qt refused for almost a decade to support deserializing 64-bit integers from JSON because of compatibility concerns.

          • Spivak an hour ago ago

            jq couldn't handle them until relatively recently. This isn't a few bad parsers. You can't assume a json parser will handle bigints correctly and when you're angling to be low-level plumbing and work with every language and software that hasn't been recompiled in years you have to be pretty conservative in what you send.

        • theamk an hour ago ago

          If you fully work within a trusted environment, why bother with JSON? Use your favorite binary serialization with codegen.

          The whole point of JSON is almost every programming language can read and write it - and if you want this to be the case, stringify anything unusual, like large integers.

    • theamk an hour ago ago

      Sigh.

      Varlink is designed for IPC, so the expected data rate is maybe hundreds of messages per second in the worst case.

      JSON can be parsed at 3 Gigabytes/second [0]. Even unoptimized stdlib parsers in scripting languages get 50 MB/sec.

      That's more than enough, standard library support is much more important for a project like this.

      And if there is ever a reason to send tar archive or video data, there is always "upgrade" option to switch to raw TCP sockets, with no JSON overhead at all.

      [0] https://news.ycombinator.com/item?id=19214387

      [1] https://github.com/TeskaLabs/cysimdjson

    • Asooka 2 hours ago ago

      To be honest I don't see why they don't use the existing IPC mechanism created for Android - binder and ashmem. It has been tested and developed in real world scenarios for over a decade by now.

      • RandomThoughts3 an hour ago ago

        Varlink was specifically designed to be available as early as possible when the kernel is booting and requires virtually nothing apart from good old sockets. But yes, definitely getting yet another one vibe from this.

        Then again, Dbus is not really what I would call an incredible piece of software - we are very much in the adequate territory if even - so anything shaking up the status quo can’t be too bad.

      • AshamedCaptain an hour ago ago

        Same reason Android (or PalmOS) didn't decide to use any of the existing IPC mechanisms, and why then Luna/webOS also didn't decide to use Binder even though they had experience with it and reinvent something else.

        That urge that forces every developer to reinvent RPC every other week.

    • nbbnbb 2 hours ago ago

      Lets take it a level up. I'm still not sure why we even need a message bus in there in the first place. The whole Linux boot, init, systemd pile is a Rube Goldberg machine that is very difficult to understand at the moment. The only reason I suspect most people aren't complaining is that our abstraction level is currently a lot higher (docker / kubernetes etc) and machines are mostly ephemeral cattle and few people go near it.

      As for JSON, please don't even start me on that. It is one of the worst serialisation decisions we ever made as a society. Poorly defined and unreliable primitive types, terrible schema support and expensive to parse. In a kernel, it does not belong! Give it a few months and varlink will be YAML over the top of that.

      • jchw 2 hours ago ago

        I think it doesn't have to be a message bus per-se, that design decision is mostly just because it's convenient. Even D-Bus can actually be used without the bus, if you really want to.

        D-Bus is just a bus so it can solve a bunch of different IPC problems at the same time. e.g. D-Bus can handle ACLs/permissions on RPC methods, D-Bus can handle multiple producers and multiple consumers, and so on. I think ultimately all that's really needed is IPC, and having a unified bus is just to allow for some additional use cases that are harder if you're only using UNIX domain sockets directly.

        If there are going to be systemd components that have IPC, then I'd argue they should probably use something standard rather than something bespoke. It's good to not re-invent the wheel.

        Not that I think Varlink is any better. It seems like at best it's probably a lateral move. I hope this does not go forward.

        • nbbnbb 2 hours ago ago

          > If there are going to be systemd components that have IPC, then I'd argue they should probably use something standard rather than something bespoke. It's good to not re-invent the wheel.

          This is my point.

          My favourite DBus situation a number of years ago was a CentOS 7 box that reboot stopped working on with a cryptic DBus error that no one has ever seen before. I had to sync it, power cycle the node from the ILO card and cross my fingers.

          I really don't give a shit about this. I just wanted to run my jobs on the node, not turn into a sysadmin due to someone else's dubious architectural decisions.

          • jchw 2 hours ago ago

            Yes but the systemd developers don't want to implement their own protocols with e.g. ACL checking, and given some of their track record I kind of think you don't want them to, either. I'm pretty sure the error conditions would be even more bespoke if they "just" used UNIX domain sockets directly. Don't get me wrong, there's nothing particularly wrong with UNIX domain sockets, but there's no "standard" protocols for communicating over UDS.

            • amluto 2 hours ago ago

              This is systemd we’re talking about. A service manager that already mucks with mount namespaces.

              It would be quite straightforward to map a capability-like UNIX socket into each service’s filesystem and give it a private view of the world. But instead…

              > Public varlink interfaces are registered system-wide by their well-known address, by default /run/org.varlink.resolver. The resolver translates a given varlink interface to the service address which provides this interface.

              …we have well known names, and sandboxing, or replacing a service for just one client, remains a mess. Sigh.

            • nbbnbb 2 hours ago ago

              Well there sort of is but people don't tend to know or use it. If it's within the same machine and architecture, which should be the case for an init system, then a fixed size struct can be written and read trivially.

              • jchw 2 hours ago ago

                C structs are a terrible serialization format, since they are not a serialization format at all. Nothing guarantees that you will get consistent struct behavior on the same machine, but also, it only really solves the problem for C. For everything else, you have to duplicate the C structure exactly, including however it may vary per architecture (e.g. due to alignment.)

                And OK fine. It's not that bad, most C ABIs are able to work around this reasonably OK (not all of them but sure, let's just call it a skill issue.) But then what do you do when you want anything more complicated than a completely fixed-size type? Like for example... a string. Or an array. Now we can't just use a struct, a single request will need to be split into multiple different structures at the bare minimum.

                And plus, there's no real reason to limit this all to the same machine. Tunneling UNIX domain sockets over the network is perfectly reasonable behavior and most* SSH implementations these days support this. So I think scoping the interoperability to "same machine" is unnecessarily limiting, especially when it's not actually hard to write consistent de/serialization in any language.

                * At least the ones I can think of, like OpenSSH[1], Go's x/crypto/ssh[2], and libssh2[3].

                [1]: https://www.openssh.com/txt/release-6.7

                [2]: https://pkg.go.dev/golang.org/x/crypto/ssh#Client.ListenUnix

                [3]: https://github.com/libssh2/libssh2/pull/945

                • nbbnbb an hour ago ago

                  Note within the domain of this problem was the point. Which means on the same machine, with the same architecture and both ends being C which is what the init system is written in.

                  You are adding more problems that don't exist to the specification.

                  As for strings, just shove a char[4096] in there. Use a bit of memory to save a lot of parsing.

                  • jchw an hour ago ago

                    > You are adding more problems that don't exist to the specification.

                    D-Bus does in fact already have support for remoting, and like I said, you can tunnel it today. I'm only suggesting it because I have in fact tunneled D-Bus over the network to call into systemd specifically, already!

                    > As for strings, just shove a char[4096] in there. Use a bit of memory to save a lot of parsing.

                    OK. So... waste an entire page of memory for each string. And then we avoid all of that parsing, but the resulting code is horribly error-prone. And then it still doesn't work if you actually want really large strings, and it also doesn't do much to answer arrays of other things like structures.

                    Can you maybe see why this is compelling to virtually nobody?

      • otabdeveloper4 2 hours ago ago

        > I'm still not sure why we even need a message bus in there in the first place.

        Because traditional POSIX IPC mechanisms are absolute unworkable dogshit.

        > It is one of the worst serialisation decisions we ever made as a society.

        There isn't really any alternative. It's either JSON or "JSON but in binary". (Like CBOR.) Anything else is not interoperable.

        • jchw 2 hours ago ago

          There are a world of serialization formats that can offer a similar interoperability story to JSON or JSON-but-binary formats. And sure, implementing them in every language that someone might be interested in using them in might require complication, but:

          - Whatever: people in more niche languages are pretty used to needing to do FFI for things like this anyhow.

          - Many of them already have a better ecosystem than D-Bus. e.g. interoperability between Protobuf and Cap'n'proto implementations is good. Protobuf in most (all?) runtimes supports dynamically reading a schema and parsing binary wire format with it, as well as code generation. You can also maintain backwards compatibility in these formats by following relatively simple rules that can be statically-enforced.

          - JSON and JSON-but-binary have some annoying downsides. I really don't think field names of composite types belong as part of the ABI. JSON-like formats also often have to try to deal with the fact that JSON doesn't strictly define all semantics. Some of them differ from JSON is subtle ways, so supporting both JSON and sorta-JSON can lead to nasty side-effects.

          Maybe most importantly, since we're not writing software that's speaking to web browsers, JSON isn't even particularly convenient to begin with. A lot of the software will be in C and Rust most likely. It helps a bit for scripting languages like Python, but I'm not convinced it's worth the downsides.

          • otabdeveloper4 2 hours ago ago

            Sorry, but bash isn't a "niche language" and it doesn't have an FFI story.

            • eep_social an hour ago ago
            • jchw 2 hours ago ago

              I don't know how to tell you this, but, you don't need to implement an RPC protocol in bash, nor do you need FFI. You can use CLI tools like `dbus-send`.

              I pray to God nothing meaningful is actually doing what you are insinuating in any serious environment.

        • nbbnbb 2 hours ago ago

          This is a quite frankly ridiculous point. Most of that garb came from the HPC people who built loads of stuff on top of it in the first place. It's absolutely fine for this sort of stuff. It's sending the odd little thing here and there, not on a complex HPC cluster.

          As for JSON, are you really that short sighted that it's the only method of encoding something? Is "oh well it doesn't fit the primitive types, so just shove it in a string and add another layer of parsing" acceptable? Hell no.

          • otabdeveloper4 2 hours ago ago

            > ...it's the only method of encoding something?

            If you want something on the system level parsable by anything? Yes it is.

            • nbbnbb 2 hours ago ago

              protobufs / asn.1 / structs ...

              Edit: hell even XML is better than this!

              • RandomThoughts3 44 minutes ago ago

                Structs are a part of C semantic. They are not an ipc format. You can somewhat use them like one if you take a lot of precaution about how they are laid out in memory including padding and packing but it’s very brittle.

                Asn.1 is both quite complicated and not very efficient.

                They could certainly have gone with protobufs or another binary serialisation format but would it really be better than the current choice?

                I don’t think the issue they are trying to solve is related to serialisation anyway. Seems to me they are unhappy about the bus part not the message format part.

              • baq an hour ago ago

                Thank goodness they didn’t pick YAML though.

      • therein 34 minutes ago ago

        Yeah, how will number/float serialization go? Are we going to serialize them as strings and parse them? That abstraction isn't handled the same way across multiple languages.

    • hi-v-rocknroll 2 hours ago ago

      Sigh. Cap'n Proto already exists. Reinventing the wheel yet again because NIH.

      • NekkoDroid an hour ago ago

        (Varlink also isn't something new)

  • 6SixTy an hour ago ago

    So does this mean I can shove SQLite databases into the file structure, abstract away an entire SQLite engine into a single header that you are forced to use, and declare that IPC? And you know, since we _are_ already hiding databases within the file structure, why not use it to also unify configuration and installation files too? What could go wrong?

    • RandomThoughts3 28 minutes ago ago

      I’m having trouble following what you mean.

      Yes, you can certainly send a whole SQLite database using any binary communication protocol as a way to exchange data today. You can even compress it before if you feel like it.

      It will not make for a good ipc format because, well, it’s not designed to be one but it would most definitely work.

      What’s your point?

  • c0balt 3 hours ago ago

    That looks interesting, I especially appreciate the intentional design decision for documentation and schemas. While it didn't work quite well with dbus Ime it is still a good idea to make APIs more accessible.

  • jmclnx 2 hours ago ago

    Does this mean Lunux will need both Dbus and varlink if using systemd ? I ask because I believe Firefox uses Dbus.

    https://www.phoronix.com/news/Systemd-Varlink-D-Bus-Future

    • NekkoDroid 2 hours ago ago

      systemd has been using varlink for a while now along side dbus, its not something just now being introduced. They don't have a problem living side-by-side.

  • arianvanp 2 hours ago ago

    Accompanying talk from Lennart Poettering a few days ago:

    https://media.ccc.de/v/all-systems-go-2024-276-varlink-now-

  • edelbitter 2 hours ago ago

    But what is the excuse to start from scratch? Rather than a compat layer that convinces the world that raw your interfaces, libraries and clients are so much nicer to work with that the other thing should be demoted to compat layer eventually?