How Discord stores trillions of messages (2023)

(discord.com)

387 points | by jakey_bakey a day ago ago

211 comments

  • foobazgt 20 hours ago ago

    This blog post seems to blame GC heavily, but if you look back at their earlier blog post [0], it seems to be more shortcomings in either how they're using Cassandra or how Cassandra handles heavy deletes, or some combination:

    "It was at that moment that it became obvious they deleted millions of messages using our API, leaving only 1 message in the channel. If you have been paying attention you might remember how Cassandra handles deletes using tombstones (mentioned in Eventual Consistency). When a user loaded this channel, even though there was only 1 message, Cassandra had to effectively scan millions of message tombstones (generating garbage faster than the JVM could collect it)."

    And although the blog post talks about GC tuning, there's mention here [1] that they didn't do much tuning and were actually running on an old version of Cassandra (and presumably JVM) - having just switched over from CMS (!).

      0) https://discord.com/blog/how-discord-stores-billions-of-messages
      1) https://news.ycombinator.com/item?id=33136453
    • Aeolun 18 hours ago ago

      But then it’s still nice that they’re using ScyllaDB and now it’s not a concern at all right?

      Even if they were using their original solution wrong, I think the solution that cannot use wrong is superior.

      • ericvolp12 16 hours ago ago

        The funny part is ScyllaDB still uses tombstones for deletions, though they do have configurable compaction strategies and iirc Discord uses Scylla's Incremental Compaction Strategy that I suppose solves the specific issue they were dealing with. iirc that compaction strategy will trigger a compaction once a certain threshold of a partition is tombstones and then the table is rebuilt without the tombstoned content (which effectively pauses writes on that specific node and that specific table and partition for the duration of that process). Compacting a massive partition is really expensive. Scylla defaults to warning you that a partition is too large if it has at least 100,000 rows in it. My guess is when they moved to ScyllaDB they also adopted a new strategy for partitioning messages in a channel that keeps partition sizes reasonable so compactions don't take a super long time.

        • sroussey 2 hours ago ago

          Good default configurations can mean quite a lot if people don’t tune them.

        • jhgg 4 hours ago ago

          We did not change schema or partitioning strategy.

      • roenxi 16 hours ago ago

        I don't see anything here that looks untoward. They increased their data storage by 3 orders of magnitude and decided to use a different DB system. Fair enough, maybe they've learned more about the nature of their data.

        But that logic isn't sound. When dealing with huge amounts of data there are going to be trade-offs. Picking a system that makes different trade-offs to an existing system is not automatically helpful. Yes you don't have the old problems. However, you are about to discover new problems. There is always something of a gamble around which will be more of a problem to your business.

    • vips7L 5 hours ago ago

      > having just switched over from CMS (!)

      This is really interesting. CMS was removed in Java 14 after being replaced by G1GC in Java 9. They were probably running an antiquated Java 8 or 11 runtime. So that means that in 2022 they were either running a 4 year old Java 11 runtime or an 8 year old Java 8 runtime. They were really leaving a lot of performance on the table.

  • dorlaor 12 hours ago ago

    Some additional nuggets by ScyllaDB co-founder: - Discord couldn't complete repair with Cassandra. Not the case with Scylla - Scylla has a lot in common with Cassandra, from a good reason, like the LSM tree, compaction etc. However, Scylla has a unique CPU&IO schedulers which allows us to prioritize the queries over compaction, and defer compaction to the half milisecond where we have enough idle bandwidth. We have plenty of articles about it - Scylla has a new (1.5 years) tombstone_gc=repair - a much safer mode - Scylla's new architecture of Raft and tablets was recently launched and is the next big thing for our users. Watch the cool youtube video of those tablet load balancing

  • leetrout 21 hours ago ago

    Needs (2023)

    That services layer reminds be of a big, fancy, distributed Varnish Cache... they don't mention caching and they chose the word coalesce so I assume it doesn't do much actual caching. But made me think of Varnish's "grace mode" and it's use to prevent the thundering herd problem (which is where I first heard of 'request coalescing') https://varnish-cache.org/docs/6.1/users-guide/vcl-grace.htm...

    Also love to see consistent hashing come up again and again. It's a great piece of duct tape that has proven useful in many similar situations. If you know where something should be then you know where everything is gonna come look for it!

    • mnutt 6 hours ago ago

      Grace mode itself doesn’t prevent thundering herd; varnish coalesces all requests automatically and grace mode is used to increase the likelihood of clients receiving cached (albeit stale) responses.

    • loloquwowndueo 21 hours ago ago

      Coalescing and “origin shielding” tend to be more common terms for that - I’ve never heard of “grace” until today :)

    • hinkley 15 hours ago ago

      Nginx always more businesslike.

          proxy_cache_use_stale updating;
    • dang 16 hours ago ago

      Year added above. Thanks!

  • aaptel 13 hours ago ago

    This whole problem wouldn't exist if we used distributed chat protocols which have been around for over 40 years (IRC). With the added benefit of having an open specification and multiple implementations. No walled gardens.

    And if you think IRC is too old for the modern world take a look at matrix or xmpp.

    How did we let discord take over is a mystery to me, or rather a tragedy.

    • rollcat 12 hours ago ago

      IRC does not store messages, it only relays them to clients. You need an add-on solution to store chat history, something we've been taking for granted for ~30 years.

      IRC all but requires using a bouncer to follow a conversation from more than a single device.

      IRC does not encrypt messages, only (optionally) the client<->server connection. Without E2EE, you have no privacy against the server/operator, which is an easily targeted SPOF.

      Matrix (the protocol) is still in flux, and the implementations are lagging behind the spec. If you're not using Element, you're behind on features and security.

      XMPP is (similarly to IRC) relying on optional protocol add-ons for basic things, like E2EE, which clients may or may not support fully or correctly.

      I recommend reading these breakdowns by soatok: https://soatok.blog/2024/08/04/against-xmppomemo/ https://soatok.blog/2024/08/14/security-issues-in-matrixs-ol...

      2013/Snowden happened 11 years ago. E2EE should by now be considered a basic feature, a commodity, something we should be calling for as relentlessly as we did for HTTPS. (Discord of course does not implement E2EE.)

      • grishka 9 hours ago ago

        Truth is, E2EE isn't a "basic thing". It's an add-on feature that most people don't want. It is impossible to have E2EE that doesn't leak into the UX, and most people would rather have a streamlined UX than deal with key management. It is also much more complex to have robust E2EE in a group chat.

        The thing that sets E2EE apart from HTTPS is that HTTPS requires nothing from the end user. It just works. And as a site owner, you just set it up once and forget about it.

        • rollcat 7 hours ago ago

          > It is impossible to have E2EE that doesn't leak into the UX

          True, but one is also free to study the UX solutions implemented on platforms such as iMessage, WhatsApp, and Signal, which all have strong E2EE and see plenty of mainstream usage.

          > [...] HTTPS requires nothing from the end user.

          Depends on how you define "nothing". We've collectively put an insane amount of work to bring HTTPS to where it is today. Also, HTTPS continues to rely heavily on each server operator's skills and diligence.

          There's also plenty of edge cases where HTTPS clients need to go an extra mile, such as containers (many base images do not include a cacert bundle), IoT/retrocomputing/other underpowered devices, and so on. There's always a cost, but it's usually worth it.

          • grishka 5 hours ago ago

            I should've said "true E2EE".

            On iMessage, your keys are managed by Apple. You effectively fully trust them (which seems to be the assumption in most of Apple products anyway). I wouldn't call this a "real" E2EE implementation.

            In WhatsApp, you're limited to one device logged into your account, and the rest are proxied through it. And message backups, those are annoying.

            In Signal, you have all those stupid backups too, and while you're able to log into multiple devices (it seems), your past messages don't load "for your own security", and there's also this stupid time component so you get logged out on your computer if you haven't used the Signal desktop app for some weeks (which I don't).

            Whereas on Discord, Telegram, Slack and other IM services without end-to-end encryption, you log in on a new device and that's it. You instantly get access to all your messages since the beginning of time, and stay logged in forever.

            • rollcat 12 minutes ago ago

              > On iMessage, your keys are managed by Apple. You effectively fully trust them (which seems to be the assumption in most of Apple products anyway).

              I'd argue there are many scenarios in which this might be preferable to a lengthier/wider supply chain. Personally I'd sooner trust Apple than Microsoft+(Lenovo/HP/Dell/...)+(Intel/AMD/Qualcomm/Broadcom/...)+(every device with DMA (PCIe/TB), unless you trust your IOMMU)+(.../...)... (you get the point). And the alternatives to Microsoft are each its own kitchen sink.

              > In Signal [...] your past messages don't load "for your own security" [...]

              I agree that this is quite annoying. HTTPS clients resolved a somewhat similar problem (usage of self-signed certificates) by trusting the user to make an informed choice. I wish Signal would trust their user base to make their own choices there as well.

              > Whereas on Discord, Telegram, Slack and other IM services without end-to-end encryption, you log in on a new device and that's it. You instantly get access to all your messages since the beginning of time, and stay logged in forever.

              Same with iMessage. Whether this is a feature or a bug, depends on your threat model.

              But we're in a situation where we don't even get to make an informed choice - every solution (as you pointed out) comes with its own bag of UX shortcomings. These trade-offs should be user choices, not something the vendor forces upon you. But these are not fundamental shortcomings of E2EE as a concept, but particular issues with its different implementations. WhatsApp shows you can restore messages from a backup; Signal shows you can have "real" multi-device presence; etc. If we could spend 1/100th of the effort we did to push HTTPS everywhere, E2EE could be just as ubiquitous today.

            • brobdingnagians 4 hours ago ago

              Just spitballing, but couldn't you have a new device login as three fields, username, password, and encryption key? Then if you don't add the encryption key you don't get the history, but still access the account. Then if password managers really saved all three, then would simplify it for more people (at least those with password managers). But there still has to be a cultural shift for a lot of people to password managers asking non-tech people

            • iknowstuff 4 hours ago ago

              I think whatsapp no longer proxies via a single device.

              On iMessage, you can verify keys now.

          • saberience 6 hours ago ago

            Yes but see the group size limits on iMessage which is 32!!

            Effectively making it useless for so many people, the reason is due to e2e encryption.

            In contrast, Telegram has groups with 1000s of participants, but only possible as they don’t use e2e encryption.

      • Zambyte 8 hours ago ago

        > IRC does not encrypt messages, only (optionally) the client<->server connection. Without E2EE, you have no privacy against the server/operator, which is an easily targeted SPOF.

        FWIW this point isn't relevant to the IRC vs Discord discussion, since Discord is also very not E2EE. That said, XMPP my preferred protocol that checks all of the boxes.

        • rollcat 7 hours ago ago

          > [...] since Discord is also very not E2EE.

          I have stated that at the end of my original comment. I'm not advocating for Discord (merely enumerating IRC's and XMPP's shortcomings), but I would like to point out once again, that post-2013 any solution that does not enable strong E2EE by default should not be advocated for - at all.

          > That said, XMPP my preferred protocol that checks all of the boxes.

          Read up soatok's breakdown on the design & status of OMEMO. I'm not a cryptographer, but I do trust a cryptographer when they say some protocol's design/crypto is broken.

          • vidarh 5 hours ago ago

            Maybe for your your use. For my use, not a single thing that goes over discord are things I'd object to being posted on a public website. That includes DM's. Not having E2EE means something isn't a solution for actually private conversations, but a lot of conversations happens in setting that are not actually private in any sense.

            • collingreen 4 hours ago ago

              I personally think I am unable to perfectly guess today what I will want/need to have private forever.

              This is one of the tenets underpinning my thoughts about why privacy matters.

              • multjoy an hour ago ago

                But Discord & IRC aren't generally private spaces. They're no different to web forums in that you would reasonably expect that something you write today would be accessible without reference to you in 10 years hence.

                That's a very different proposition to a private/group message exchange in WhatsApp/iMessage etc.

      • crtasm 6 hours ago ago

        Nothing stopping a server also acting as a bouncer and storing messages: https://ergo.chat/about

      • timeon 3 hours ago ago

        > IRC does not encrypt messages

        Wasn't SILC later used for this instead of IRC?

      • AnonCoward42 10 hours ago ago

        > IRC does not encrypt messages, only (optionally) the client<->server connection. Without E2EE, you have no privacy against the server/operator, which is an easily targeted SPOF.

        Same as Discord.

        > Matrix (the protocol) is still in flux, and the implementations are lagging behind the spec. If you're not using Element, you're behind on features and security.

        Discord also only has one reference client, but for me even with that client Matrix/Element was not as reliable. I still use and like it, but it's not a like for like in that regard.

        > XMPP is (similarly to IRC) relying on optional protocol add-ons for basic things, like E2EE, which clients may or may not support fully or correctly.

        But if you use current clients like Conversations or Dino or the likes it does work. There is no point in counting the clients that don't support it if these aren't the reference or biggest ones. The problem here is more that it's not meant to be used like Discord in any way. Not for big group chats/channels nor for big voice chats (not even sure this possible).

      • voidnap 11 hours ago ago

        > IRC does not store messages, it only relays them to clients.

        Some people consider this a feature and prefer using IRC bouncers to discord.

        OMEMO solved encryption for XMPP a decade ago. I haven't seen it on IRC yet though.

        • brysonreece 11 hours ago ago

          Some (most) people want to easily talk to their friends or interest groups without having to worry about it.

        • dakom 10 hours ago ago

          I do consider it a feature, in hindsight. Learning to program by asking "dumb" questions was great, because chats were ephemeral, nobody cared if the same question was asked for the 10 millionth time or risk of embarrassment being like 12 years old and asking greybeards for help.

          Nobody also felt bad saying "RTFM" because, whatever, it blows over in a minute, there's no permanent record of having a harsh moment, more free to just move on.

          The same old questions being asked due to no search also provided more opportunities to answer those questions, so, newbies could start to learn by teaching.

          So, yeah, I think something beneficial was lost, even if I wouldn't go back to that approach- it's more of a tradeoff than a definitive improvement

          • znpy 10 hours ago ago

            > I do consider it a feature, in hindsight. Learning to program by asking "dumb" questions was great, because chats were ephemeral, nobody cared if the same question was asked for the 10 millionth time or risk of embarrassment being like 12 years old and asking greybeards for help.

            I pity the new generations for not having this kind of opportunity: the opportunity to make mistakes, say dumb stuff and goof off with all these things vanishing in a matter of minutes, hours at most.

            I miss the old internet: at any point you could pick a new nickname and get a fresh and clean new email address from many of the webmail providers and just start a new online life.

            And it was considered normal. It was actually a "best practice" to never use nicknames.

            I miss the old internet.

            • sham1 6 hours ago ago

              Remember when phrases like "Never use your real name online" used to be near universal? Yeah, this is something I also miss about the old Internet.

              Like, even back then you could absolutely tie your IRL identity up with your online identity, but the difference of course was that it wasn't a requirement of existing online, like it is now. Like yeah, you can stay anonymous but a) it's super difficult since the modern day assumption is that you're not doing that and b) that you're up to no good, because why would you be hiding who you are, unless you were doing something shady. And now even "normal" people lament just where we went wrong and what happened to online privacy. To the aware, privacy dying like this was clear as day, but I suppose most just didn't hear, or chose to ignore, the alarm bells.

              And now everything is logged, analysed, and associates with the people who produced the messages and other sundry content. There is no ephemera, we need laws just to be forgotten by services (as an EU citizen, I'm glad about law existing here, but it shouldn't need to be a law, it ideally should be assumed), and we're constantly getting watched by both states and surveillance capitalists alike. Not actively in most cases, mind you, but passively, with our movements, our interactions online, and just what we do, just getting aggregated into these humongous data sets of Big Data, to train statistical models on. Mostly to surveil us even harder, or to manipulate us in the form of advertisement, which can be even more insidious in some ways.

              I'm sure that stuff like the Cambridge Analytica fiasco could have occurred even without this destruction of privacy, anonymity, and ephemeral content, but I posit that it would have been way more difficult had people not been encouraged to put everything about themselves into services that would log them and build evermore complex models about them and their thoughts. And now this kind of stuff can be used to destroy democracies, and as alluded to earlier, manipulate for example our spending habits. And now we all wonder just where this all went wrong.

              I miss the old Internet.

            • MichaelZuo 5 hours ago ago

              This approach simply doesn’t work when users are allowed to vote or have any sort of scoring mechanism. Since bad actors will also create multiple “online lives” and manipulate those systems with a few clicks

    • Ecoste 8 hours ago ago

      > How did we let discord take over is a mystery to me, or rather a tragedy.

      The fact that you're baffled why discord took over is exactly why it took over. You can't even acknowledge that the user experience is 10x better and it's suitable for a general non-technical audience.

    • dewey 12 hours ago ago

      I’m a huge IRC fan and I dislike Discord, but all these other services are way too clunky and IRC is really only usable through IRCCloud that has a relatively okay mobile app these days.

      Recently a very technical group I’m part of migrated from Telegram to Matrix and the user experience is just not very good. The apps are buggy, don’t look good, then in the new “Element” app SSO isn’t supported so I can’t use my account with it. There’s lots of paper cuts that are okay for someone like me who likes to figure it out but I’d never try to convince my friends to use it.

      • nunobrito 9 hours ago ago

        For telegram refugees then maybe SimpleX is an option, except it has no bots nor other options for clients at the moment.

        What I personally use is the nostr protocol through a client like Amethyst or OxChat. Messages and groups can be E2EE private, or you can just use the public groups.

        The biggest advantage is that you are joining a bigger community of apps and services built on top of the same protocol, rather than joining some isolated island (again).

        • dewey 9 hours ago ago

          I recently listed to a nostr podcast and even people working in it said it would not be reasonable to recommend it for a secure messaging app at this point. Just because very early things like metadata leaking are not addressed yet. So not really an alternative.

          • nunobrito 7 hours ago ago

            I don't know what podcast you are mentioning or the context. Anyone can say anything on youtube.

            We are talking about a transition from telegram, when comparing to that platform then NOSTR is undoubtely more secure when noticing that telegram doesn't even encrypt conversations by default and this isn't informed to users. Whereas in NOSTR you are made aware when a conversation is private between both parties.

            Metadata is fetchable for 99% of messaging apps out there. If you'd ask me about making a more secure app then this involves continuous streaming of data, padding of messages to avoid content guessing and avoid the usage of internet as data channel.

            So it really depends on what you consider secure and what it is compared against. Compared to Telegram it is more secure. Compared to a piece of paper encrypted with a custom algorithm and delivered by a trusted human transporter? Not really.

    • high_na_euv 11 hours ago ago

      >How did we let discord take over is a mystery to me, or rather a tragedy.

      Orders of magnitude better product than anything competition had at the time?

      • doublerabbit 4 hours ago ago

        > Orders of magnitude better product than anything competition had at the time?

        Nah, it just comes down to non-techy folk wanting to play/chat with their friends in a just-work configuration.

        Mumble, TeamSpeak were always janky, needed a hosted server. IRC is multiplayer notepad.

        Geeks care about E2E, and all that glory but these folks don't. And that's what Discord dishes; as did Y!M, MSN, ICQ, AIM back in the day.

        All discord has done is replaced those above as GitHub has replaced SourceForge.

        We didn't care if the message were encrypted or not back then. Why do we now?

        • StableAlkyne 3 hours ago ago

          > Geeks care about E2E

          *Some* geeks. Specifically those who are into encryption.

          There is nothing wrong with wanting an application to just work, especially when it's significantly better than what came before (contemporary competitors were Skype and IRC)

        • pphysch 2 hours ago ago

          You're just describing why Discord was a much better product.

    • throw16180339 4 hours ago ago

      > How did we let discord take over is a mystery to me, or rather a tragedy.

      Anyone can set up or join a Discord server. If you give users the choice between a complex open platform and an easy proprietary solution, they will pick the latter every time.

    • tannhaeuser 11 hours ago ago

      There’s no lack of open chat protocols and federated services but those have mostly torpedoed themselves: by usability and discoverability problems, holier–than–you attitudes, and plain nerd attention wars. Such as XMPP (used a lot until around 2010 but easily dragged into the mud because XML and overengineering), Mastodon (saw a surge as twitter was faltering but then seemingly stopped to be everyone‘s darling as its limitations became obvious, among them Mastodon admins taking their audience hostage; also ActivityPub fans going around advertising it for each and everything when RSS is just fine for web sites, damaging news feeds alltogether in the process).

      Where spamming, or the systematic exploitation of digital communication by the „ad industry“, was killing it in the past (Usenet, and arguably the web), today there‘s also the problem of being consumed by LLMs to push non-public messaging. Though I‘m not sure the latter is really a concern for many, as developers not only are giving away their code, but their entire activity log/issues and their solutions on github such that they can easily be digested and replaced by coding assistant LLMs, git being a distributed system in the first place.

      • Terr_ 11 hours ago ago

        > among them Mastodon admins taking their audience hostage

        I was excited first hearing all the "fediverse" stuff, but having to hand over control of your online identity to a particular node forever felt a little bit like "old boss, same as the new boss."

        (Yes, I know some folks are working on the identity issue.)

        • nunobrito 8 hours ago ago

          Reminds when I joined the largest mastodon server for my country. Advertised by the owner as a bastion for free speech, democracy and fair treatment. Then in 2020 started mass banning everyone "that went against science" on the covid fraudemia at our country.

          Twitter on those days was bad, but that mastodon server sure became even worser. Nowadays found a fresh air of innovation with Nostr. No more servers with your data and followers locked inside.

          You can silence the people you don't want to hear, you won't hostage them into forced silence any longer.

        • paulryanrogers 6 hours ago ago

          Mastodon means you can at least pick your boss, be your own boss, and take your identity and followers to a new boss. (Possibly even taking your content too, though maybe not links)

          • StableAlkyne 2 hours ago ago

            Did they ever address the problem of migration from a bad server?

            For example, a scenario where your server dies and does not return. Or a malicious actor takes over and bans the user base. Or a honeypot encouraging user account migration, followed by bans.

            In all 3 cases, you are effectively screwed the moment you migrate to a malicious server, or your server becomes malicious.

            I remember blue sky trying to address this by tying your identity to a DNS record or something, but it's a severe limitation in anything trying to be decentralized

          • MichaelZuo 5 hours ago ago

            Picking a ‘boss’ in a system where the average ‘employee’ has no credible way of assessing or evaluating them, or their superiors, and zero prospects of ever getting a face to face meeting with, is effectively no different to having the boss picked by an anonymous shareholder meeting in SF.

            If all of the potential bosses have roughly the same degree of accessibility… which is the case for Mastodon for anything over a few hundred users.

            • paulryanrogers 4 hours ago ago

              Compared to closed gardens like Discord and Xitter, Mastodon is a significant improvement.

            • ThrowawayTestr 4 hours ago ago

              What's stopping you from messaging server owners or stalking their profile to see they're ideologically compatible?

    • maccard 13 hours ago ago

      If you want to know why, look at the App Store reviews for discord and tea speak and compare them.

      Discord just works.

    • elcomet 12 hours ago ago

      IRC and distributed protocols un general had a big issue : you loose history every time you disconnect

      • menaerus 12 hours ago ago

        In the age we are living this starts to sound more like a feature to me.

        • MatthiasPortzel 6 hours ago ago

          The other reply goes to airplanes but there are much more common ways to get disconnected. Locking my phone or closing my laptop lid disconnects me from IRC. A lot of Discord users have desktops that are always on (since Discord originally advertised to gamers), but a lot of Discord users don’t.

          Discord is fundamentally a very versatile platform. If you lose one seemingly unimportant, you lose a lot of versatility. Maybe I’ll write a blog post just with examples of how I’ve used it. It replaces IRC, but it also replaces Facebook groups, Skype, a lot of group texts, and a lot of email for me.

        • agumonkey 6 hours ago ago

          It does alter the meaning of chat tremendously. In discord, often things become heavy, because we're not talking, we're accumulating information, and you have to stay on purpose so data is manageable and seekable.

          The few times I join IRC I know we're only here to chat, it's semi-transient (a little bit more if logs are stored) and I feel lighter.

        • StableAlkyne 2 hours ago ago

          You and your friends lost history, but the server owner never did :)

        • rtpg 10 hours ago ago

          Is it really that much of a jump to say "I would like to see the chat that has happened between my friends between the time I got on a plane and then got back off"? Does that sound odd?

          Imagine if you couldn't receive e-mail while you were offline!

          This isn't to disparage IRC and friends too much, obviously there's huge value in it existing as a synchronous chat room. Just... async chat is a thing that totally happens for most people.

          • serf 8 hours ago ago

            a non-technical person wouldn't consider the implications of a history log with regards to security or data hoarding, they just see it work and think of it as a convenience.

            this value sell shifts in the mind of the non-technical person once they're told that the feature they want implies non-ephemeral data that will be systematically sifted through either for legal or financial benefit by a third party.

            in other words : the reason why 'async chat is a thing that totally happens for most people.' is because a vast majority of people are simply unqualified to even see the problem, much less seek alternatives or solutions to the data hoarding that they must comply with.

            this creates a social effect and pulls everyone into Discord, regardless of their beliefs on the matter, simply because it has become 'the only game in town'.

            regardless of personal preference, centralization of these kind of things is BAD for the user in nearly all circumstances aside from convenience.

            • Shog9 3 hours ago ago

              Please stop pretending that "data hording" didn't / doesn't happen on IRC. There's nothing inherently friendly to security or privacy in the protocol; if anything, it's quite the opposite.

              That you can, with augmentation and diligent op-sec, get something a bit better than Discord isn't a great selling point unless you have the time and resources and buy-in already, not just for yourself but from everyone in your group. At which point, there are still better options than IRC.

              For decades now, the main draw of IRC has remained a fetish for conspicuous configuration, as it embodies a sort of brutalist architecture of communication software. The excuses change every few years, but the love for cobbling together a barely workable system from parts remains core.

          • menaerus 9 hours ago ago

            Sure, the advantages of async communication are obvious but the crucial difference is that in that case vendor has to store your data somewhere in the data center. Reusing that data for unsolicited purposes is what many people will have a concern with.

            • indeyets 7 hours ago ago

              But logs are stored on IRC as well. It’s not a part of standard protocol, but a lot of ir c-servers can do that automatically and there are boys which do that not to mention personal archives. The difference is that end-users don’t have easy access to this logs. And on discord they do (because it is a part of protocol)

            • cmiller1 8 hours ago ago

              How about a secure async chat where the vendor simply stores a list of message IDs, and then the client requests if anyone has a copy of any message you haven't received yet from the other users in chat when you log on

              • menaerus 6 hours ago ago

                Such vendor would have a hard time finding a business model since plenty of chat-services are already existing on the market and all of them have access to the data of their users in one way or another. Thus I don't know what other type of leverage they would be able to pull off to sustain their business.

    • Intralexical 10 hours ago ago

      > How did we let discord take over is a mystery to me, or rather a tragedy.

      I think I'm reasonably technically competent, and I also dislike Discord's issues with privacy, data sovereignty, siloing information away from the open web, etc.

      But you know what I think whenever I click a Matrix link, or IRC? I just don't want to deal with it. You get a list of apps you've never heard of, some of which may not be feature-complete, some with more than one version, some which are advertised using words like "GNOME", "Rust", "Qt5", and "C++" that have no meaning or relation to actually using them as a chat app, and all of which I guess are different and would need to be tried and learned separately. Then picking and clicking one tries to open an outside program which probably isn't installed and I don't want to install because I don't really know/care what it is. And if at that point, out of the dozen or so app options it showed you, you happened to choose one with a web version like Element, and you figure out you can click the "Continue in your browser" button out of the four or five unexplained buttons that pop up as a result ("XDG-Open", "Cancel", "FlatHub", "Download", and "Continue in Browser")— You get a static screen that shows just enough message history to not be useful, with a confusing UI you can't seem to interact with, hidden behind a login wall that still hasn't really explained what in the Internet tubes you're actually looking at.

      E.G.: https://matrix.to/#/#invidious:matrix.org

      If you try to Google "What is Matrix"— You get pages about math. So then you Google "What is Matrix chat". And all the results harp on using words like "open network", "decentralised", "protocol", "real-time communication", "open standard", "federated"— Which, again, may be technically interesting if you're into that, but doesn't actually have anything to do with how it directly serves the user as a chat app and how you can use it or sign up for it.

      It takes way too many clicks, and you get bombarded with way too much information… To still not end up using the app, and in fact end up more confused than before about what a "Matrix" even is. Let's say you lose 15% of incoming users at each step. That rapidly scares off most of the mainstream, before they've even tried it. Maybe Matrix and Element are great. But it just seems like such an ordeal.

      Compare that with Discord. You click a link. And then either you're already in the server, or it has a single text box and a single button you click to funnel you through making an account and joining the server.

      It doesn't try to convince you to install a Desktop app until you're already fully using it in the web version. You get clear answers and reasons to use it if you search "What is Discord" or go to the website. It doesn't overwhelm you with options and then hound you with technical explainers that you didn't ask for.

      IRC goes the other way in usability. People want voice chat, message history, different channels in the same "server", PM channels, etc.

      /rant

    • weaksauce 3 hours ago ago

      because the voice chat function is so leaps and bounds better than anything out there and it was primarily used for that to game in real time. the text was an afterthought for gamers.

    • Krasnol 13 hours ago ago

      Usability did it.

      You download an exe, install it, make an account and it runs. Just like that. Everybody can do it.

      There are tons of useful and great software out there. Most of it is not easy for the public. Some (most?) of it doesn't even have an GUI. People rather sell their identity and even pay than suffer through too many hops.

      • Intralexical 9 hours ago ago

        Not even a EXE. The web version is feature-complete, so you only need to click a link.

        • Krasnol 9 hours ago ago

          You're right. I forgot about that.

          I also forgot all those people who came from the TeamSpeak servers.

    • RadiozRadioz 3 hours ago ago

      There are loads of comments exactly like OP's, and they always make the mistake of mentioning IRC alongside XMPP and Matrix. Inevitably repliers can't help themselves and spend their replies discussing IRC's unsuitability for modern IM and how it's not federated. When IRC is mentioned, commenters ignore XMPP and Matrix and attack the point in terms of IRC. (Though this thread in particular is better than average).

      Matrix and XMPP are the far more appropriate competitors for Discord, we need to steer the conversation toward them. I deliberately never mention IRC when I make these types of comments so people don't latch onto it and ignore everything else I said.

    • lofaszvanitt 12 hours ago ago

      Discord wrapped irc in shiny paper.

    • EGreg 11 hours ago ago
      • philipwhiuk 7 hours ago ago

        > Own this piece of crypto history

        I would argue that the web lost it's way as much with "web3" as with the platforms of web 2.

        • EGreg 3 hours ago ago

          I didn’t write that.

          You must be quoting an ad, and dismissing everything else

  • dean2432 12 hours ago ago

    They make it literally impossible to delete your old messages. It's a privacy nightmare and I wonder why the EU hasn't stepped in.

    • Intralexical 7 hours ago ago

      I do think there is a balance to be struck, because directed communication means the recipients of old messages are also stakeholders, such that maintaining a consistent record by default is a fundamental part of the "service" they offer. The message contents are different from e.g. secretly hoovering up click patterns. Matrix had some thoughts when they faced the same questions:

        The key question boils down to whether Matrix should be considered more like email (where people would be horrified if senders could erase their messages from your mail spool), or should it be considered more like Facebook (where people would be horrified if their posts were visible anywhere after they avail themselves of their right to erasure).
      
        Solving this requires making a judgement call, which we've approached from two directions: firstly, considering what the spirit of the GDPR is actually trying to achieve…
      
      https://matrix.org/blog/2018/05/08/gdpr-compliance-in-matrix...
    • Xen9 10 hours ago ago

      In Discord culture, indeed, users usually share a shit-ton of PII in "introduction" messages from images to specific hobbies to medical information (EG "support" communities).

      The problem from GDPR perspective is that Discoed makes it impossible to delete those, since once thet detect your interest in trying to delete any of your accounts' data, they will try to get to "anonymisize" it. Then at least publicly your username isdisconnected from thos messages, but they can still be traced back to specific persons. Now if this also is done server side, then they would be in a situation where you'd either have to go through ton of messages or to bulk delete past messages of all to enforce the GDPR demands of an user wanting their PII deleted.

      EU Parliament is not a real Parliament in the sense that ONLY the Comission can propose new laws, and the elected parliament basically just votes on those. Who controls the Comission if not the people? The US State Department. Newsguard and non-Musk US bigtechs including Discord are in the same poli-financial bed of the establishment here. And they are full of previous state department workers.*

      Unless there is public outrage, the EU-level bodies at least will probably be owned. But Public opinion is controlled by the cyberpunk establishment that trains their LLMs & targets their campaign ads using that illegal Discord data to get political advantage.

      You in my view ought to "worry" about the fact that it's possible there will sooner or later no longer be escape from a permanent establishment, Orwell-style. Goes along with the theme that "cybersecurity" is the United States government level has been "war against hate speech" for years, and of course "hate speech" meaning "censorship of internal and external enemy speech."

      Budd Dwyers if I recall correctly shot himself in TV after writing to Biden (???) that under some conditions (that became true), the Department of Justice should have "Justice" removed from its name.

      ---

      Most of this I hold only at 50+% confidence of being broadly correct. Take with lots of salt.

      • r3d0c 7 hours ago ago

        incoherant babbling

    • intelVISA 5 hours ago ago

      Given the sheer size and extent of the user data collected and processed one imagines the EU is working on a big case... quietly.

  • robmccoll 9 hours ago ago

    Cassandra is essentially an append-mostly distributed fault-tolerant hash table. If you need specifically that with high write throughput, it's a good choice. I don't understand why people use it as a database. You run into it's limitations immediately and the pain of trying to use it like a database only gets worse with scale.

    • LeifCarrotson 5 hours ago ago

      FTA:

      > In Cassandra, reads are more expensive than writes.

      This makes it insane as a message store for a chat server to me. It seems appropriate for a logging destination for a distributed system, one where you want lots of clients to dump data but most of the time you don't even need to audit the logs, so the number of reads for a given item is less than one. This is obviously not true for Discord messages.

      • Squeeeez 5 hours ago ago

        Not too sure - I would have guessed that most of the messages are written once, read by the constant number of participants (say 1-100 or so) and then they disappear off the screen and are never accessed again, ever. Maybe a few people will scroll or search, or use some custom extension to load and export the history, but very rarely.

    • mianos 8 hours ago ago

      All the Casandra documentation and web site say it is a database. You can't blame anyone from getting confused. In my experience, I have never seen a project that started to use it, continue to use it after a year or so it may take a year to run into its limitations before having to replace it, with a database, like Postgres.

  • hiyer 18 hours ago ago

    Very well-written article. I'm happy for them that part of the solution was switching from Cassandra to drop-in replacement Scylla, rather than having to deal with something entirely different.

  • PaulHoule 7 hours ago ago

    How is they just can’t shard the thing? Isn’t each Discord ‘server’ isolated from the others (can’t send a message from one to the other?) Why can’t they address trillions of messages by having thousands of shards that each handle billions?

    • hun3 6 hours ago ago

      Last time I checked the Discord bot API, it had explicit provisions for sharding.

  • jimkoen 21 hours ago ago

    My takeaway from this is maybe somewhat different from what the authors intended:

    > The last one? Our friend, cassandra-messages. [...] To start with, it’s a big cluster. With trillions of messages and nearly 200 nodes, any migration was going to be an involved effort.

    To me, that's a surprisingly small amount of nodes for message storage, given the size of discord. I had honestly expected a much more intricate architecture, engineered towards quick scalability, involving a lot more moving parts. I'm sure the complexity is higher than stated in the article, but it makes me wonder, given that I've been partially responsible for more than 200 physical nodes that did less, how much of modern cloud architecture is over engineered.

    • romanhn 21 hours ago ago

      They are talking about 177 database nodes, which is not an indicator of architecture complexity. I assume they have dozens/hundreds of services consisting of multiple highly available nodes each across various geographies.

      Having seen a much smaller set of Cassandra nodes used to store billions (rather than trillions) of records, I can say that Cassandra was definitely a total PITA for on-call, and a cause of several major outages.

    • nicholasjarnold 21 hours ago ago

      > ...how much of modern cloud architecture is over engineered.

      I would wager a good majority of it is. The Stack Overflow architecture[0] sticks out to me in this regard as an example on the other end of the spectrum.

      [0] https://news.ycombinator.com/item?id=34950843

    • hiyer 18 hours ago ago

      Also bear in mind that they're now doing the same with just 72 nodes.

  • bofaGuy 7 hours ago ago

    I’m lost at why a DB (Cassandra) with better write performance than read performance was ever selected for a messaging system. I feel like it’s obvious that a message will be read more than it is written (once).

    • remram 5 hours ago ago

      The fact that it has better write speed than read speed doesn't mean that it has bad read speed. It just happens to have even better write speed.

      It's like how I connect my phone to my home's cable connection to send a big file. It is better at downloading than uploading, but that doesn't mean it's not the best solution for uploading.

    • SpikeMeister 7 hours ago ago

      While it’s true that messages are read more, reading can be cached so not every read necessarily results in a DB call.

      • axelthegerman 6 hours ago ago

        Which seems something they added recently but was not part of the original design of using Cassandra

  • crakhamster01 2 hours ago ago

    Interesting technical read, but I appreciated the lighthearted jokes/comments the author threw in as well. Felt like they struck the right balance - nice work!

  • airocker 4 hours ago ago

    Just wondering if anyone considered using Postgres or another relational db. I understand it won’t do multi master replication as well but it is much more stable and predictable if you give it right amount of traffic. I guess the team had to do that part anyways for ScyllaDB

    • crop_rotation 3 hours ago ago

      I don't think anyone runs Postgress at that scale (unless very specialized sharding setup). Given the choice between using ScyllaDB like everyone else and using Postgres in a super specialized best in the world setup, the choice becomes clear. Also keep in mind that Discord is not a huge super profitable company, so for them to develop something like vitess for Postgress would not make sense. For a small company with huge data like discord, using existing data solutions makes a lot more sense.

      • airocker 2 hours ago ago

        They could use vitess, citus or alloydb. They could use read replicas for read operations and single master in a shard for write. They would get many SQL features (upgrades, referential integrity etc) for free. It would allow them to extend their business logic considerably.

  • codexon 21 hours ago ago

    > The ScyllaDB team prioritized improvements and implemented performant reverse queries, removing the last database blocker in our migration plan.

    I wonder how much they paid ScyllaDB to do this before even using ScyllaDB.

    • jsnell 21 hours ago ago

      The article says they were using ScyllaDB for everything except the message store two years before they did the migration for messages.

  • dang 16 hours ago ago

    Discussed (a bit) at the time:

    How Discord Stores Trillions of Messages - https://news.ycombinator.com/item?id=35048410 - March 2023 (10 comments)

  • tcfhgj 21 hours ago ago

    Storing is one thing. Performing data mining on them is another

    • philipwhiuk 7 hours ago ago

      That's a separate problem with hugely different latency concerns, likely done on a separate copy.

    • CamperBob2 21 hours ago ago

      Also, people need to keep in mind that those trillions of messages are archived nowhere. Thanks to the walled gardens we're obsessed with building, far-future anthropologists will know more about Pompeii and Machu Picchu than San Francisco.

      • squigz 14 hours ago ago

        Firstly, no they won't. That's silly.

        Secondly, how would such an archive work? Who would pay for it? How would it be safeguarded in such a way that it can be read by 'far future anthropologists' but not the people paying for the storage?

        • geysersam 9 hours ago ago

          If we're only talking about public chat rooms, it shouldn't be difficult to archive the content of those.

          There are open repositories of the entire internet text content (common crawl). These scrapes are periodically repeated. That's orders of magnitude more data than all discord messages ever.

          So technically it's not a problem making such an archive. The financing is of course always an issue, but not because the costs are large.

      • xboxnolifes 21 hours ago ago

        I don't think every single individual message ever needs to be archived. Every text, every email, every post-it, every poke, every emoji, every reaction GIF...

        • ktosobcy 14 hours ago ago

          Well, considerting annoying push for "let's resolve the issue on discord" it's very annoying. With things like github issues you can search for a problem and find a solution. Even ancient mailing lists most of the time have archives. Not so much with all those fancy "realtime" :/

          • klabb3 10 hours ago ago

            I agree with the sentiment but GitHub issues is not a good replacement. First, it’s also owned by a corporation and is available on the open web today because they let us (is it even scrape/api available today? Can people build tooling on top?). Anyway, this “openness” can easily be changed once the “value extraction knob” is turned.

            Secondly, GitHub is a developer platform, not a user/enjoyer platform. Issue reports are high-barrier even for devs. People get upset if you’re asking a random question, don’t check for duplicates, etc. Some people even get upset about issues without a PR.

            Again, I’m all for good open alternatives but when HN is like “you just configure Gentoo and type 30 commands” we don’t stand a chance to actually win users over, gotta accept reality before we can improve it…

        • famahar 20 hours ago ago

          Definitely not everything, but it's still wild to me that so many products and services have all their troubleshooting and customer support in a discord server.

          • proteal 14 hours ago ago

            It makes sense to me. The number of people who actually create useful open source software is so vanishingly small compared to the number of people who use OSS, it seems obvious that we should optimize for their time, not the other way around. I agree with you that using mailing lists or GitHub issues or whatnot would be globally more efficient, but if I’m working on a product, I’m going to work in the way that is most efficient for my time. I owe my “customers” nothing because they are not paying for my work. We keep seeing discord as a means to communicate about products because devs see it as the best use of their time. The fact that so many people use it should be an indictment on the alternatives, not the devs who choose to use discord.

          • foobazgt 20 hours ago ago

            Sadly, I can understand why Discord doesn't have a lot of incentive to do this. Maybe the community should popularize an open-source free/low-costing bot and hosting solution for exported chat? (I couldn't find one in a few minutes of searching).

          • ekianjo 16 hours ago ago

            Even FOSS communities. shame on the devs who decide to do so.

            • Kiro 13 hours ago ago

              It used to be IRC channels on Freenode and I didn't see anyone complaining back then.

              • CamperBob2 5 hours ago ago

                That's the thing. No one ever complains at the time.

            • squigz 14 hours ago ago

              Why do you and GP think so many FOSS projects choose to use Discord like this?

      • daedrdev 15 hours ago ago

        For many people the fact that discord is not easily discoverable is a benefit, just like in many other messaging services

  • cynicalpeace 20 hours ago ago

    Is there a fundamental reason you wouldn't use postgres for something like this? Scale certainly wouldn't be it.

    • ericvolp12 16 hours ago ago

      ScyllaDB scales horizontally on a shard-per-core architecture with a ballpark throughput of 12,500 Reads and 12,500 Writes per second per shard. If you're running Scylla across a total of 64 cores (maybe on 4 VMs with 16 vCPUs each), you can get up to 800k Reads 800k Writes per sec of throughput with P99 writes of <500us and p99 reads of <2ms.

      You will not be able to get that performance out of Postgres and the write scaling will also be impossible on a non-sharded DB.

      If you're a company like Discord and are running dozens (70-something?) of ScyllaDB nodes, likely each with 32 or 64 vCPUs, you've got capacity for 50M+ reads/writes per second across the cluster assuming your read/write workloads are evenly balanced across shards.

      • jhgg 14 hours ago ago

        Fwiw the benchmarked numbers are for writing very small rows. When doing the messages migration, with no read traffic, and the cluster/compaction settings tuned for writes we only managed approx 3m inserts/sec while fully saturating the Scylla cluster.

        • menaerus 6 hours ago ago

          How about per-node memory pressure, did it change in favor of Scylla? I ask because I would legitimately expect that GC-based system would have a larger pressure on the memory subsystem.

          • jhgg 4 hours ago ago

            Scylla just eats all the ram it can with cache. So it's hard to say really. On Cassandra we allocated half the ram to the JVM which it gladly used up and left the other half to the OS for disk cache. On Scylla, since it uses direct io, there is no need for OS disk cache.

      • ryanjshaw 14 hours ago ago

        Okay but this is where I get confused. Why does Discord need a single database system when discord servers are independent, right?

        And the volume of traffic per Discord server must be human-processable or what would the point be? A Discord server doing 800k writes per second makes no sense.

        So why not a RDBMS per Discord server, and if you want to ship all that out to a warehouse for analytics you do that as a separate problem?

        Or is it that spinning up a Postgres instance per Discord server ends up being significantly more expensive than these mega distributed database systems?

        • jhgg 14 hours ago ago

          There are ballpark of a few hundred million discord servers... do you really want to run that many Postgres instances? And even so what do you do about DM/GDMs? Easier to just run one big mega cluster for messages.

          • ryanjshaw 13 hours ago ago

            Okay so the latter then - economies of scale. Surprised to hear that few hundred million figure - I thought it'd be 1/10th of that at most! Wow.

            Although I did expect there'd be a very long tail, and you might choose to host a bunch of servers on a single RDBMS, at that scale yeah it wouldn't solve much.

            Thanks for coming back to me, appreciate it.

        • Drew_ 4 hours ago ago

          Apple kind of does something like this with iCloud however their per user "databases" are only virtual:

          https://news.ycombinator.com/item?id=39028672

      • riku_iki 3 hours ago ago

        > You will not be able to get that performance out of Postgres

        if writes are batched, I get this and higher performance from postgres. If 800k on 64 cores is Scylla's best result, it is not that impressive.

        But also you probably mean writes/reads to indexed table, then it is another story.

    • justnoise 19 hours ago ago

      I'd guess that Discord's storage systems lean towards processing a lot more writes than reads. Postgres and other databases that use B-tree indexing are ideally suited for read heavy workloads. LSM based databases like Cassandra/Scylla are designed for write intensive workloads and have very good horizontal scaling properties built into the system.

      • Aeolun 17 hours ago ago

        Would you actually have more writes than reads? Are messages read by fewer people than post them?

        • sadeshmukh 16 hours ago ago

          When you send a message, afaik it sends to all people looking at it at the time. So there is no read when in a conversation, and maybe the reads are batched when reading multiple.

      • jhgg 13 hours ago ago

        Read traffic is much higher than write traffic due to mobile clients needing to sync chat history more often as their sessions are much shorter lived. Also search queries execute 1 query per result. And don't forget people doing GDPR data dump requests. It adds up.

    • cowthulhu 20 hours ago ago

      I’m not sure if Postgres would have enough horizontal scaling to accommodate the insane volume of reads and writes. I would be super interested to be proven wrong though… anyone know of a cluster being run at that scale?

    • riku_iki 19 hours ago ago

      > Scale certainly wouldn't be it.

      vanilla postgres can't scale to such size, you need some sharding solution on top, which likely will be much harder to maintain than ScyllaDB..

  • m-hodges 18 hours ago ago

    Fun article. Also fun to think about how many people have decided to document their crimes in these Cassandra nodes.

  • pavel_lishin 21 hours ago ago

    Anyone else reading this and being quite happy that they're not working at this scale?

    • wavemode 19 hours ago ago

      I don't mind scale. I mind the bureaucracy and promotion-driven-development that comes with working in a bloated engineering org.

      • pm90 19 hours ago ago

        +100

        Many companies have products that operate at “scale”. They manage to do so with pretty boring techniques (sharding, autoscaling) and technologies (postgres, cloud storage).

        Because of the insane blog driven tech culture, many of these teams get questioned by clueless leadership (who read these blogs) and ask why the company isn’t using cassandra / some other hot technology. And it always causes much consternation and wastage.

        • rnts08 11 hours ago ago

          Anyone wanting to introduce $new/$other language, database, library, deployment system, build system into a large enough system that doesn't solve any actual problem is a nightmare for someone working at this scale.

          I don't mind the scale, I like it. I don't like having to fend off questions and complaints why we aren't deploying the latest shiny new thing in our core this week.

        • secondcoming 4 hours ago ago

          Well we use Cassandra (actually ScyllaDB) because Redis no longer cut it.

    • Twirrim 20 hours ago ago

      But that's where the really fun and complicated problems are. The ones that really make you stop and think, and not just think, but be creative.

      95% of the work is still the same "treading in well trod paths", same old same old tech work, but that 5% is really something.

      • Olreich 8 hours ago ago

        This was a “double-pump” migration to a faster database and building a caching service. There’s nothing particularly fancy or creative about their solutions. The migration efforts and working out issues with the reverse table scan were probably way more creative, but they didn’t get into that unfortunately.

      • pavel_lishin 8 hours ago ago

        I think I can understand the appeal, but it's just not there for me. I have enough complicated problems outside of work, some of which are even fun to solve.

    • twelve40 8 hours ago ago

      I'm happy I'm currently not working at this scale. I'm not happy when idiots (including one of our self-important ex-Google VP's) set this as a benchmark for backend interviews (for careers that 99% likely will never come close to such problems).

    • est 18 hours ago ago

      I am happy that I dont have to deal with this.

      I am sad that my business aren't as big as this scale.

    • Aeolun 18 hours ago ago

      Honestly, 77 nodes doesn’t sound like a terrific scale? The more I scale things up, the more I realize that the tone of the problems doesn’t really change. You just get more layers to your data structures.

    • mystified5016 21 hours ago ago

      Any time I read anything about any web-adjacent technology I'm incredibly thankful that I don't work anywhere near that industry.

      Embedded can be complex, but web stuff is just a Lovecraftian nightmare in comparison

      • milesvp 20 hours ago ago

        I have stared into the abyss and seen the eyes of cthulu. I am much happier writing embedded drivers than I was trying to make sense of why previous devs thought it was a good idea to move bounded tunable server side api calls to the client, allowing it to effectively write arbitrary sql calls across multiple databases.

        • bdcravens 19 hours ago ago

          Fortunately the web is starting (very slowly) to return to sanity, pushing back towards the simpler server-rendered pattern with Javascript being relegated to specific use cases.

          • Aeolun 17 hours ago ago

            I really like the client rendered UI part. It’s a lot more efficient than sending the whole page again every time.

            • bdcravens 17 hours ago ago

              Which is precisely what is meant by specific use cases. We don't have to throw out the first 25 years of the web and reimplement all of our business logic in a minified JS blob. Even when client side code is necessary, the trend of pushing rendered HTML rather than JSON that must be parsed and rendered keeps us as close to browser primitives as possible.

              • Aeolun 16 hours ago ago

                Why would you implement the business logic there? You can still keep (most of) that in the backend.

                The client just does orchestration.

                • bdcravens 5 hours ago ago

                  Once you move beyond basic CRUD business requirements work their way into the UI. For instance, making fields read-only based on access level. Adding additional form fields, etc. Conditionally hiding and showing entire portions of the UI. All of which requires you to either pass around UI-directives in your data or implement business logic in your client code. Better to just ship HTML, and if we're worried about full page loads, just use one of the many over-the-wire options to only change small bits of a page.

                  This is before we get into having to implement application primitives like authentication on the client, and all of the state management that goes with. The absolute amount of scaffolding and plumbing we've built up just to save a few ms is always worth questioning. Doesn't mean the answer is no, just that we need to ask the question and not assume the default is carved in stone.

            • gonzo41 17 hours ago ago

              But you can cache the whole server side page and the cost is once. Whereas if you have the client side do the render then every client wears the cost.

              • Aeolun 16 hours ago ago

                That’s your generation that happens once. The browser still needs to render it. Sure, rendering it on the client may cost the client a bit more, but the client generally has the computational power to spare.

              • bdcravens 17 hours ago ago

                Which becomes a far more important issue when dealing with bandwidth or CPU constrained devices, or artificially imposed constraints due to data usage costs.

              • iknowstuff 17 hours ago ago

                You usually can't because of users who are signed in needing slightly different pages etc.

                • bdcravens 17 hours ago ago

                  While not as fast as a purely client cached page, the server can selectively cache content, even when some bits of the page are dynamic.

              • asynchronous 17 hours ago ago

                We can also cache some of the dynamic JavaScript, depending on the scenario but your point stands.

        • qudat 20 hours ago ago

          Iteration speed is significantly fast on the client. Perf is an afterthought — for better or worse

          • swyx 18 hours ago ago

            spoken like someone who doesnt deploy clients at discord scale?

            the 200 backend nodes surely update significantly faster than the hundreds of millions of clients.

        • artursapek 19 hours ago ago

          Sounds like a fun time lol

  • qntmfred 6 hours ago ago

    i usually start projects with postgres this days. i have reached the tens of millions of rows threshold without breaking a sweat, but is there any good reason postgres can't handle into the billions or trillions? any well known products at that scale that are known to use postgres?

    • bastawhiz 3 hours ago ago

      Postgres can pretty easily scale to billions or trillions of rows. It forces you to think carefully about how you query that data, though, and I think most beginners would find themselves in deep trouble jumping into the deep end.

      • qntmfred 2 hours ago ago

        > most beginners would find themselves in deep trouble jumping into the deep end

        probably true for any database platform. postgres probably easier for beginners than cassandra

    • mxscho 6 hours ago ago

      Just the raw amount of data is not enough metrics to judge whether postgres is "enough". They seem to value horizontal scalability e.g. in terms of write throughput, which is easier to handle with something like their solution compared to postgres.

  • tonetegeatinst 21 hours ago ago

    My love of embedded stuff is growing. I'm self teaching C and assembly....to get better at low level programming and interactions with hardware but it all seems much simpler than the big data systems. Granted I'm sure it call be broken down into steps and issues to solve like any programming issue but I'm happy focusing on low level stuff for now.

  • KaoruAoiShiho 19 hours ago ago

    Did they go with ScyllaDB just because it was compatible with Cassandra? Would it make sense to use a totally different solution altogether if they didn't start with that.

    • jhgg 19 hours ago ago

      Yes, we wanted to migrate all our data stores away from Cassandra due to stability and performance issues. Moving to something that didn't have those issues (or at least had a different set of less severe issues) while also not having to rewrite a bunch of code was a positive.

      • ericvolp12 16 hours ago ago

        Did you guys end up redesigning the partitioning scheme to fit within Scylla's recommended partition sizes? I assume the tombstone issue didn't disappear with a move to Scylla but incremental compaction and/or SCTS might have helped a bunch?

        • jhgg 14 hours ago ago

          Nope. Didn't change the schema, mainly added read coalescing and used ICS. I think the big thing is when Scylla is processing a bunch of tombstones it's able to do so in a way that doesn't choke the whole server. Latest Scylla version also can send back partial/empty pages to the client to limit the amount of work per query that is run.

  • gigatexal 12 hours ago ago

    What a fun write up and a huge confidence building post for me in ScyllaDB.

    • yas_hmaheshwari 9 hours ago ago

      Does this article imply that don't use Cassandra. Use ScyllaDB when you think you want Cassandra

  • akimbostrawman 14 hours ago ago

    in cleartext

  • SupremumLimit 18 hours ago ago

    I see more people mixing up past and present tense randomly, as in this post. It’s confusing to read. Is the concept of tense starting to disappear entirely in US English, I wonder?

    • GrantMoyer 8 hours ago ago

      The post appears to consistenly use past tense for things that were true in the past at time of writing, and present tense for things that are true in the present or are always true. So the use of tense appears to be valid, though not following commonly prescribed style.

    • phist_mcgee 12 hours ago ago

      Your question is rude and I hope you know that.

      He's walking us through the process of designing the solution. Why wouldn't present tense work for this? We're discovering things with him as he takes us along for the journey.

    • nerdponx 13 hours ago ago

      No, what a ridiculous thing to say. Storytelling in the present tense is not new.

  • jaimehrubiks 20 hours ago ago

    Until they don't, or they can't, and they need to start deleting.

    (Not trying to undermine the engineering efforts, or the welcoming engineering blog posts though! I really think all these is needed)

  • dobin 9 hours ago ago

    So the TL;DR is: Cassandra and ScyllaDB have bad performance when reading. So they put a cache in front.

    • jhgg 4 hours ago ago

      No cache. Just read coalescing. There is a big difference. Coalescing just ensures that while a query is executing if an identical query arrives, rather than sending the same query as an already executing query to the database it will wait for the existing query to complete and duplicate the result. If after this the same query arrives again, it will be issued against the database.

      This means we don't have to deal with cache invalidation/consistency issues while also being able to handle thundering herds, for example a large server pinging @everyone and having a bunch of people click into the channel or launch their apps in response.

  • 7bit 13 hours ago ago

    The blog posts shows how great the technical expertise is at Discord. I work in IT and in my company devs are so incompetent, they don't even know how to create an M365/Azure dev tenant and constantly request *.Read write.All to our production tenant. I'm so envious!

    On the other hand, the HOME/END keys jump to the beginning of the input field rather than the line and the frontend devs are unable to fix this non-default behaviour for years, which makes it a fucking pain in the ass to use the Posts feature within a Discord channel. I believe the budget for the backend geniuses meant that frontend had to be juniors only.

    • crop_rotation 3 hours ago ago

      Hiring good is probably the most important thing for a company and also one of the hardest problem. I have seen a team of competent engineers outperform their sibling teams by 5-10x as long as each member of the team is good enough. Just 2 bad hires will slow down a team drastically. One terrible hire can do -5x work of a normal engineer.

    • fastball 13 hours ago ago

      In their defense, Azure is terrible.

  • andrewstuart 14 hours ago ago

    When you get to scale like this, I wonder if the access patterns of the application and its data might be best served by a custom data retrieval and storage application.

    I may be wrong but I just wonder if efficiency is lost to the generalized nature of any data storage system.

    The other question that comes to mind is, to what extent have the developers made a systematic effort to optimize how data is stored and retrieved? If you’re building a gigantic back end system and simply accepting that the system load is what it is then you might be missing a chance to dramatically impact the size of the task of managing that data.

    • lyu07282 10 hours ago ago

      They did give one example, if someone does a @everyone in a big channel, they specifically optimized their architecture to make that efficient using their custom data services.

  • pawelduda 19 hours ago ago

    Pretty fun read, even tho I'll never work at such scale lol

  • znpy 10 hours ago ago

    Interesting read on one had, a bit disappointing on the other: when the solution is just "we moved to this other product" it smells of lack of serious and rigorous investigation.

    Also, having worked with the JVM and with GC issues I don't buy the "GC problems" point: there are a number of improvements in recent JVM release, the main being ZGC (and generational ZGC in particular).

    ZGC is great, I've personally witnessed sub-millisecond GC pauses (and i mean sub-millisecond stop-the-world pauses) on machines serving millions of requests per second. Garbage Collection is largely a solved problem in the industry as of today, thanks to ZGC.

    Other than this, also comparing latencies for machines with 9TB disks rather than 4TB disks is a bit like comparing apples and oranges: we will never know if issues at the storage layer were affecting tail latencies. Were the node having, i don't know, filesystem fragmentation issues? Does the 9TB storage configuration deliver higher iops than the previous 4TB storage configuration? Is the same kind of hardware underneat (same disk type? same disk bus? or are we talking ssd vs nvme?).

    As somebody that's been doing performance engineering for work, this piece is a bit appalling.

    Glad to see they've solved their issue though!

    • ozgrakkurt 6 hours ago ago

      GC is a problem, and it always will be at some level. You can improve it but that doesn’t mean it is not a problem. Memory allocation and management is a problem even in c/c++ problems if you want to optimize your program, there is no universe where gc is not a problem

  • zombiwoof 21 hours ago ago

    I think it’s annoying they interview engineers like they are Google and reading the blog they made it up and learned some basic “pitfalls” as they went along

  • xyst 16 hours ago ago

    Having used discord in the past. Most of the conversations were just shit posts. Nothing serious. Why even bother storing a trillion messages of garbage in the first place?

    • huimang 16 hours ago ago

      Many people within niches have discord servers for researching and discussing specific things. There is a large wealth of information locked away behind them that can be lost pretty much whenever discord decides to start pursuing different monetization strategies.

    • squigz 14 hours ago ago

      That sounds like it was a problem with the communities you engaged with.

    • adzm 16 hours ago ago

      Because that is literally what Discord is for

    • jerryspringster 13 hours ago ago

      How do you sort the good from the bad? I'm sure most of my conversations were shit posts aswell but some weren't, especially when it figuring out how something new worked or how to fix a problem.

    • hypeatei 16 hours ago ago

      That's why I laugh when people say discord content needs to be indexed on the web so things are more discoverable. 99% is garbage and the useful messages are scattered across channels.

      • retsibsi 12 hours ago ago

        I'm not trying to be a smartarse but doesn't this describe the entire internet? The good stuff is rare and scattered, and that's why search is so important.

        • hypeatei 6 hours ago ago

          At least with forums, there are dedicated pages for whatever is being discussed. Discord is just a collection of channels with topics being split up across multiple messages and shitposts in the middle.

        • jcgrillo 10 hours ago ago

          Just wait until the LLM bots start arguing with each other on discord ;)

    • aurareturn 16 hours ago ago

      How do you differentiate shit posts vs quality ones if you’re Discord?

  • robertclaus 15 hours ago ago

    Very cool that even at this scale the right vanilla SQL database just works. No fancy document store, map-reduce, or GPU implementations needed.

    • salomonk_mur 14 hours ago ago

      How is ScyllaDB (the solution used in the article) a vanilla SQL DB? Its the complete opposite!

      • melodyogonna 12 hours ago ago

        The syntax is SQL

        • biorach 8 hours ago ago

          That... doesn't necessarily mean that it's a "vanilla SQL server"

    • hinkley 15 hours ago ago

      It annoys me sometimes how effective B-trees are.

      Every decade has some cool breakthrough in compression, and a handful of other disciplines. But OLTP databases are still basically better B-trees.

      • menaerus 13 hours ago ago

        LSM trees? ScyllaDB is LSM-based storage engine. RocksDB as well.

    • asjfkdlf 6 hours ago ago

      Aren’t they using a NoSQL store? They migrated from Casandra to Scylla DB