Firing people for bad architectural
decisions is generally a terrible idea - especially decisions that shipped and ran in production for several years.
This article also doesn't make a convincing case for this being a huge mistake. Companies like Uber change their architectural decisions while they scale all the time. Provided it didn't kill the company stuff like this becomes part of the story of how they got to where they are.
Related: the classic line commonly attributed to original IBM CEO Thomas John Watson Sr:
“Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience?”
Also the article doesn’t attempt to explore the business and resourcing constraints they were operating under at the time.
I have been in situations where I was told “don’t worry about cost just get it done”. Then a few years later the business constraints shift and now we need to “worry about the cost”. It ignores that decisions made under a different set of constraints were correct, or at least reasonable, at the time but things change.
One of my pet peeves is when people say “do it right the first time” but the definition of “right” often changes over time. If the only major flaw of this design was that it was expensive; then I am much more skeptical that it was wrong given the original set of conditions that they were operating under.
Yeah, this is exactly what I thought when I read this post. It seemed like the author either hasn't worked in big tech, or hasn't worked in the industry very long. It's extremely likely that the engineer who designed this was standing on his desk shouting "it's going to cost THIS MUCH MONEY. I want to make sure that EVERYONE IS OK WITH THIS." and was met with shrugs.
Here's how a big tech reporting chain sees this situation when everything is smooth sailing: "We're growing 3x year-over-year? After 2 years, the cost will be an order of magnitude higher no matter what solution we pick. The constant factor doesn't matter that much. But we have such an incredible roadmap that we will book more than an order of magnitude of revenue, backed by this new ledger project. The cost will always be a nonissue because of growth."
And then 2 years go by, and this incredible product growth adds a bunch of ledger entries that weren't there 2 years ago, someone nudges your reporting chain with the question, "this is pretty expensive.. what gives?" and then someone with a good combination of social and technical skills points out that a migration to your existing storage solution would be a cost effective way to continue growing.
At every step of the way, everyone is generally happy with what's going on.
Amen, right now I’m rewriting some code and parts of an application after running for years. So I have all the advantages of knowing the bugs and history.
There is zero chance anyone who wrote this the first time would do what I’m doing.
Some things I’m simplifying because it never becomes a spot that the previous devs thought would be a big pivot point for customization and heavy use….
Also totally possible that it was just an unpublished partnership of sorts between AWS and Uber. AWS wants the logo and a big case study implementation to give the product some credibility or a boost. Uber may not have been charged at all, may have even been paid to use AWS. The Uber developer may not have even known, just was given an edict to build it on dynamodb.
I think it's important for leadership to clearly define what right is in these cases, too, otherwise, you get as many ddefinitions of "right" as you have people, times, and places.
Easy to say, but it's a real human cost to relying on people to figure out what you mean rather than explaining what you mean. Not enough time is spent on cultivating effective communication and training. Everyone wants everything done yesterday and don't feel like investing in their own people.
Do you think that the social climbers who approved these obviously crappy projects learned anything?
I have worked with all levels of engineers who come into a project glassy eyed about some technology, sure, but if you are part of the team approving a project and you cant produce a realistic budget then your management is bogus as hell.
I have worked on a ton of these vanity projects, and when I voice my concerns its clear nobody is out to learn anything, they are here to look good and avoid looking bad, that's about it.
Get some articles published, go to some conferences, get a new job with a new title somewhere else, laugh on your way out.
> Do you think that the social climbers who approved these obviously crappy projects learned anything?
Just the framing of this question makes it seem like you simply don't like people in management / decision-makers, and you want something bad to happen to them. Maybe that's wrong, hopefully it is, but the rest of the comment doesn't do much to dissuade me of that impression either.
Cutting down anyone who gets a promotion or finds success is a culture in itself (see Tall Poppies Syndrome for example). Factual accuracy is not a concern, they only want to be angry at people in higher positions.
I don't think it's that managers or decision makers are bad, I think moreso it's that, for most companies, the criteria for promotion are absolutely busted. And, it creates a culture of self-preservation, which affects ICs, too.
What I mean is that people are selected for leadership based not off of their leadership ability, but rather their political ability and ambition. The reason we see increasingly delusionally confident people as we climb the corporate ladder is because the people promoting them are forced to make their decisions based off of small, distilled data.
So, basically, bullshitters rise to the top. It only makes sense given the constraints of the system. Metrics help, sure, but firstly those arent use too much for management promotions. And secondly, they can be gamed, and often are.
At the very tippy top you have c-suite, who are often so delusionally confident it borders on psychosis. After a certain point it just becomes lying, but the truth is that people like to hear good things. We just can't help it.
And, for self-preservation: most companies have an absolutely rotten, toxic, and even evil culture. For most companies, the majority of employees are focused on self-preservation. And nobody will say that out loud!
But when managers get into that self preservation mindset, it can get really ugly. It becomes lying, organization sabotage, fudging documents, in-fighting, etc to try to stay afloat. Especially as the organization appears to be less stable.
Something bad to happen to "them"? There's no diaphanous them, just the specific social climbing crap decision makers facing no consequences of any type.
I have worked with many hard working and caring managers, and they are generally eclipsed by said social climbers presenting at conferences every other week about know-nothing topics jumping from place to place leaving bankrupt companies and massive layoffs in their wake.
Are we reading the same comment? GP clearly separated the "caring managers" from the "LinkedIn corposlop ladder climbers", and even explicitly stated the issue with the latter is that they are usurping the former in moving up the ranks of the corporate hierarchy.
This isn't unique to GP either, it's not exactly uncommon nowadays for people to hate the corpo-techbro MBA LinkedIn archetype.
I was imagining it, as the people who are the ghostly images of the "them" out "there" that are often referred to when people are generally upset at authority or the system, that's not what I was trying to talk about.
I'd say the pirates had it right and keel hauling is the way to go.
It probably was an unnecessary redesign that could have been avoided, but hey: at least it worked, and eight million dollars is not a huge amount for Uber.
Birmingham spent almost £150m for a system that didn't work at all:
If you've designed a system in house for your accounting, it works, makes neither financial nor software errors, is accepted by the users, and got away with it costing a relatively small fraction of your turnover? That's a big win.
I agree. It is a lot of money, but that's the hope from paying engineers well: to make the chances of very expensive mistakes unlikely.
One thing I did think about was how this could have been architected without sufficient reference to costs, which might have been a process or structure improvement.
Right - if your engineering organization ships designs that are bad economically, the solution is to introduce a culture of predicting costs before committing to a design, and processes to help enforce that culture.
Add "expected budget, double-checked by at least one other principal engineer" to the project checklist.
Have the person most responsive for the $8m "mistake" be the person to drive that cultural change, since they now have the most credibility for why it's a useful step!
Letting interns carry six figure equipment, which would also be unexpectedly heavy especially if this happened some years ago, would be a weird thing for any lab I’ve worked in. There are too many things that can predictably go wrong in the hands of an inexperienced person, as happened here.
Interns wouldn’t even be allowed to use $100K VNAs without a lot of supervision because so many things can go wrong. Damaging one of those small precision connectors is easy to do and can be a costly repair that brings delays to the lab, and that’s before you even start making measurements.
I wonder if part of the offense was that the intern was breaking protocol by moving the equipment. Alternatively they probably failed to explain the rules and expectations to the intern. Or maybe some lazy engineer tried to pawn off their work on to an intern without thinking about the consequence.
I'm not sure - the level of scrutiny that usage/abusage of expensive equipment gets varies wildly from organisation to organisation. I've worked in some places where very expensive equipment is handled roughly, or even taken home in some cases. In others, there are meticulous procedures for even $1-5k pieces of equipment. It's just a cultural thing.
For this example it’s the delicacy and fragility of the instrument, the price is just a proxy for that.
Expensive VNAs are also precision, calibrated instruments with small connectors that can easily be degraded by even simple misuse. Frontends destroyed or subtly damaged in ways that break measurements by allowing the wrong signal to enter.
It’s easy to damage one in a way that will interfere with measurements for months before someone realizes what’s wrong, which is more costly than the VNA itself.
These instruments require training to handle. It’s not even about the price, it’s absurd that they’d let an intern carry one around at all (if it was allowed)
This is like the hardware equivalent of an intern accidentally dropping the production DB. My first question would be how they got to the point where an intern was in a position to be able to drop the production DB because everyone understands what can go wrong
I cannot, of course, speak about this particular incident, but a person inclined to skip procedures expressly implemented to avoid the problem which occurred, or who ignores clear warnings that a problem is developing, is a liability, not a trained asset.
> Firing people for bad architectural decisions is generally a terrible idea
I mean, if we're considering factors that could make fire a developer, suggesting, pushing and eventually failing to implement bad designs and architectures probably ranks among some of the more reasonable reasons for firing them. It doesn't seem to have been "Oops we used MariaDB when we should have used MySQL" but more like "We made a bad design decision, lets cover it up with another bad design decision" and repeat, at least judging by this part:
> So let me get this straight: DynamoDB was a bad choice because it was expensive, which is something you could have figured out in advance. You then decided to move everything to an internal data store that had been built for something else3, that was available when you decided to build on top of DynamoDB. And that internal data store wasn’t good on its own, so you had to build a streaming framework to complete the migration.
But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.
> But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.
And you just teached all your workers to be as cautious as being freezed, never be proactive, keep the status quo as much as they can, avoid being noticed, and never take a step without being forced or having someone else to take 100% blame (with paper trail) if things go south.
One of my favourite bosses ever was a VP who kept a bankers box at her desk and very few personal affects.
She told me she kept it there because her job was to make decisions and get fired or leave if she was wrong. She was right about so many of her choices, I would have followed her into anything. Then one day I came in and her desk was empty -- she had an apparently epic argument with the C suite and disagreed with their path so she left (never found out if that was a quit or fired). The team got a new VP, but I requested to be moved to a different team as I wasn't aligned with the new vision.
When you get to a certain level part of your job becomes owning the decisions and getting fired.
And in some workplaces, that actually is the way to go!
I once worked in a manufacturing environment where mistakes could be quite expensive. We had our annual org survey and one of the questions asked was "Risk taking is encouraged." Our team scored low on that metric, and upper management was concerned. They held a meeting to ask about it, and most of the team was confused why there was a meeting. They said they viewed it as a positive that they don't take risks.
I guess if that's your experience of letting toxic people go, maybe everyone you worked with was toxic? The usual reacttion I see from teams when firing people who seem to make a project/product worse instead of better, tends to be a sigh of relief and a communal feeling of "Lets get back to business".
Firing people making bad choices, people tend to appreciate that. Firing people making good choices? Yeah, I'd understand that would freeze people and make them avoid making proactive choices, try to not do that obviously.
Remember you can conduct only one of the two different types of postmortem, the air crash style blameless one (to find out what happened) and the blame-based one (to find out who to punish). Once you conduct the latter, everyone psychologically "lawyers up". You get a lot more meetings. A lot more paper trail. A lot more delay. You don't just pick a database, you commission a sub-committee for database choice to review the available options over the next six months.
That's why government / civil service operations are so slow. They operate in a very high blame political environment.
Right, so say we have this situation where you're choosing a SQL database. The organization made a choice that leads to lots of complications, where often times the reason for the complication is because the organization made yet another bad choice. Repeat a couple of times.
We do a blameless postmortem about each one of these, where essentially we only focus on the root causes of the actual problems, but somehow it never comes up that there was one individual who made those bad choices over and over, which lead to the situations arising in the first place.
Do you just never address this? Do you continue to say "Well, it wasn't X's fault, it's the system around X that let X make that decision that needs fixing" even when it repeats, and the humans involved can already see what's going on?
In my mind you need to be able to address bad behavior in organizations where choices have an impact on something produced, otherwise we cannot change the quality what is being produced, or prevent production issues, since it's based on the choices we make, and if "we" make bad choices, the quality will be bad.
Ultimately I agree with you in more serious engineering-heavy domains, like airplanes and what not, and it's a sane default mode, to try to address what's happening around rather than decisions by individuals. But I also don't think that should mean that other domains aren't better served by some hybrid model, especially when it's about producing artifacts of some sort, and similar things.
>was one individual who made those bad choices over and over
This was never said, or even implied, in the article. We don't even know if this was a single person choice.
You are making up "facts" like calling the person who makes mistakes "toxic", or saying that the choice was made by someone who only made bad choices.
We are talking Uber here, in 2017, which was not only playing "move fast and break things" but "move really fast while shooting an AK47 blindfolded". Not only they expected mistakes, but they encouraged them. It would be plain wrong to start firing individual people for making mistakes if that is the environment.
> A redesign that gets replaced 2 years later is a catastrophe.
> Somebody Should Have Been Fired For This
This person is not a good resource. Uber was a very fast growing company, both in terms of their product and staff. Turnover in architecture happens. Calling this a catastrophe and click baiting about firing engineers over a rounding error in Uber’s overall finances is gross.
I understand this person is trying to grow their Substack with these inflammatory claims but I hope HN readers aren’t falling for it. This person’s takes are bad and they’re doing it to try to get you to become a subscriber. This is hindsight engineering from someone who wasn’t there.
I don’t know if the author here is intentionally clickbaiting or somehow ignorant about the scale Uber is operating at.
This was particularly egregious:
> If you’re building a system that makes the economics of your company impossible, you’re better off not building it.
If I’m understanding the timeline here, Uber replaced this system in 2019, and saved 8 million doing so. In 2019 their revenue was something like 13 billion dollars. In no world was this system making the economics of Uber impossible.
It’s a terrible article. They are angry on spending $687 a day for DDb writes when they company was doing billions in revenue. It’s silly and they have a grounded truth on how much systems cost at scale.
Uber does extensive leetcode interviews to weed out the best of the best developers just like FAANG. How could such a bad developer have been hired with this process in place and proceed to make such an expensive bad decision?
It’s almost as if gatekeeping the correct developers isn’t working. So yeah somebody should get fired. They screened to prevent this. It wasn’t prevented.
Otherwise admit the screen doesn’t work and fire the gatekeepers who are relying on it and the inverted selection of correct engineers.
> A redesign that gets replaced 2 years later is a catastrophe.
People forget how quickly Uber scaled, and the user impact of not being able to track your trips could be catastrophic to retention. There's a class of tech-influencer who think they can dissect past decisions on a blog post without being in the room when the technical constraints were being laid out. This is Monday morning quaterbacking at it's most grotesque.
> There's a class of tech-influencer who think they can dissect past decisions on a blog post without being in the room when the technical constraints were being laid out
The cost they are laying out are not that prohibitively expensive. I’ve known corporations where people spin up test clusters that cost 5K a month and forget about it. A business critical service can definitely ignore costs in the short term if they bring in customers. The standard practice is to just ship something quickly and optimize for the cost later if it helps bring in revenue/customers.
Besides, the napkin math isn’t always true. If you’re an enterprise customer for AWS, you get massive discounts, especially in the time frame they’re talking about. And when it comes to partnerships, I remember back in the day AWS used to let you do pretty much anything for free if it meant they could parade your project to other customers.
You can read the article and see it's not a tech debt trade off but someone not doing a back of the envelope guesstimate about how much DyanmoDB would cost to run their payments system on it.
This article does not make convincing arguments to match the strong criticism. Engineering decisions are difficult, especially in high-growth orgs, where one must balance many constraints and risks including opportunity cost. Handling payments is part of Uber's core product.
The financial criticism ('napkin math') appears to estimate DynamoDB costs of USD $8 million for 2017 to 2020. Uber revenue for the same period is roughly USD $42.5 billion, thus this cost weighs in at about 0.02%, or 1/50th of one percent. This is a rounding error for a high growth company, and not something that warrants a witch-hunt and firing. It's easy to blow more than $2 million per year on software engineers in pursuit of an alternative high-scalability solution.
I'm also not on board with the 'resume driven development' criticism as the explanation for solution churn. Perhaps that is actually what happened. I wasn't there and don't know, but if that is being asserted I expect to see evidence presented to support it.
Can I do a mea culpa? This is more than 3 decades back. I was a junior programmer (2-3 years in industry) sent to a client site in europe. You can imagine the state of systems those days. I wrote (or rather updated) a fix which would updated the discount and tax rates on orders based on new terms. It would run every day to account for ... whatever. You pick the values from master file, update all the orders and move on.
I wiped out VAT on all orders and for the next month the paper invoices were sent without VAT. So the invoice is $100, VAT is $20, the invoice should be $120, but they were sent as $100.
100s of invoices every day would be my guess.
Nobody noticed.
For a month.
Millions of dollars of revenue and IIRC millions of dollars of VAT.
Until a customer complained to the CEO.
We had a firefight to fix it, not just technical but legal and managerial. We can send a new invoice just for tax. We can redo the invoice. We can send a debit memo. What is the right decision? But what if customers does not pay? What about returns? How will we track returns? Of course we were doing the technical solutions and the client company was front-ending how to handle it business wise.
And the managerial firefight - who did it, what are the safeguards in future? We had a company exec visit the client site to manage the issue.
I was in the hot seat but I was protected by my managers from any fallout. Just do the work. Do not screw up again. (Test every row every column even if you did not change it)
A month later the sales director at the client company got fired.
The grapevine is that this was just the tipping point, but you never know. BTW these were paper invoices printed onsite and mailed out, but I do not know if someone had the job to scrutinize them.
PS: True story, going by old memory, although such legends remain fresh in your mind, forever. Not sure it belongs here, but the mention of firing for a multi-million dollar mistake pulled this into cache memory.
If a single non-malicious code change can break a thing like that with nobody noticing that's a catastrophic failure in testing, QA, and operations (nobody noticed 20% of euros transacted just stopped). It's hard to blame that on IC engineer.
I know that, now. But apart from this incident being in the annals of time as it relates to my work, there absolutely was no fallout, which my mind could not comprehend in those days.
At least when I worked at Uber, that wasn't really how it worked. The eng org was so big that it was nearly impossible to track all the projects people worked on, and you'd get micro-ecosystems of tools because of it.
Alleged expert hawking some kind of blog/newsletter thing doesn't know that no major company is ever going to fire an employee for an honest mistake, especially one that would have had multiple sign offs from around the organization.
I have a hard time following the thesis. I know Uber is famous for overbuilding in house but this example does not feel defensible. 2020 revenue was $11bn. 2024 revenue is $44bn. $8mm is not very material in the grand scheme. Could it be optimized? Maybe but I don’t know the full surface area and this article is overly aggressive with opinions. $250k a year just for writes sounds cheap to me when your top line is in the billions.
If you don’t have price controls, it’s easy to run up a bill.
If no single person had the responsibility to check the cost, then no one actually failed at their assigned job. So you either fix the system or fire everyone involved in the decision.
What you’re doing now is looking a scapegoat to beat up. You’re angry and you’re going to make someone pay for pissing you off.
This seems like dramatically overstating the mistake. Yeah it was expensive, and yes this could easily been foreseen, but that’s really small potatoes compared to mistakes I’ve seen. I mean I’ve seen promos off shit that never even fully worked beyond pilot scale and had to be rolled back because it was fundamentally flawed on purely technical level.
$8M sounds like a lot, but (a) the cost of making a material financial mistake c an easily dwarf this, and (b) the cost of the engineers maintaining the system was likely about this expensive anyway. And infra is expensive when you're Uber. It all seems rather overblown to me.
Hate to say it but kind of a lousy article... zippy writing but lots of Monday Morning Quarterbacking for something the author doesn't seem to show much knowledge of. Maybe this is his style to gin up subscribers, but I'm not a fan.
> But nobody was optimizing for cost. They were optimizing for their next promotion. Each rewrite was a new proposal, a new design doc, a new system to put on a resume. The incentive was never to pick the boring, correct choice — it was to pick the complex, impressive one.
...I guess it could be possible nobody thought about cost at all, and this was all misaligned incentives and resume-driven development, but I find that kind of hard to believe? As someone who has made cost mistakes in the cloud, this claim seems a bit silly.
Not to detract from his experience, but I didn't actually see much payments experience at all on his resume, so I'm curious why he's branding himself as a payments guru. Kind of tech content creation fluff, I guess.
In general is there any practical way to fix the issue of "Every rewrite was someone's promotion project"? There doesn't seem to be any incentive for employees to care about projects long term. Keeping something running smoothly is never rewarded the same as launching something new or fixing something broken.
Not really: it’s a lazy pejorative, in this case written by an LLM, not a description of reality. It’s honestly one of the stupider ideas that has caché, it seems to only survive by repetition.
Here, the tell is you’re not gonna get a multibillion dollar company on hockey stick growth to switch storage because you want to get promoted.
8M is kinda small potatoes - which is essentially the AWS business model. Sure you could build a cheaper thing but that is hard and this thing is right here, easy(ish) to use, and your company won't regret using it until you've moved into a different role.
It's helpful to rewrite software every 2 years as your team turns over. The code you understand best is code you wrote yourself, and no one likes maintaining other people's legacy code.
> With each trip generating multiple ledger entries, and Uber as a whole processing 15 million trips per day, it didn’t matter that DynamoDB was great because of high throughput at global scale. The proverbial bean counter should’ve stopped this madness from happening.
> At Uber’s scale, DynamoDB became expensive. Hence, we started keeping only 12 weeks of data (i.e., hot data) in DynamoDB and started using Uber’s blobstore, TerraBlob, for older data (i.e., cold data). TerraBlob is similar to AWS S3. For a long-term solution, we wanted to use LSG.
Honest question. Why do people go for this kind of complicated solution? Wouldn't Postgres work? Let's say each trip creates 10 ledger entries. Let's say those are 10 transactions. So 150 million transactions in a day. That's like 2000 TPS. Postgres can handle that, can't it?
If regional replication or global availability is the problem, I've to ask. Why does it matter? For something so critical like ledger, does it hurt to make the user wait a few 100 milliseconds if that means you can have a simple and robust ledger service?
I honestly want to know what others think about this.
It’s usually because executive management bakes hyper growth into the assumptions because they really want the biz to grow, then it becomes marching orders down the chain as it gets misinterpreted in a game of corporate telephone.
“We need to design this for 1b DAUs”
Then 1) that growth never happens and 2) you end up with a super complicated solution
Instead, someone needs to say, “Hey [boss], are you sure we need to build for 1b DAUs? Why don’t we build for 50m first, then make sure it’s extensible enough to keep improving with growth”
SRE here. Most of time we see choices like this because teams are under pressure to deliver and scale would likely exceed what a database will easily handle with the out of the box settings. So tweaking is required and that takes time/knowledge that Dev team doesn't have. AI helps a bit here but it didn't exist when DynamoDB solution was chosen. However, some terraform, and boom, scalable database created, only downside is the cost which is next Product Manager problem.
the author overestimates how much ~$5M/yr actually is. a business like uber isn't happy about that but it's not even in the top 10 of things they're wasting money on. moreover this isn't the engineer's sole fault it is more the fault of whoever actually approved the expense.
Outside of that, it sounds like the system worked perfectly. They launched, they paid DB costs (the 8M was not a ledger mistake) and then they rebuilt after they wanted more cost savings. Also a bunch of folks got promoted.
The 8M came from VCs lighting money on fire. Honestly this seems like the system worked as planned to me, not a case study in how not to do things.
> A redesign that gets replaced 2 years later is a catastrophe
I mean, given how quickly things can change I think the language and sentiment here isn't quite right, it's just how businesses can change and we can't necessarily control that.
Everything is a good idea until it isn’t. The entire industry was enamoured with microservices for far too long. We can look at these mistakes in hindsight and learn from them but we can’t judge them without the context of the time. Software was very different even just 10 years ago. $8m is a rounding error.
This is horrible slop, and I gave it a long chance. Gave up after handwringing about how DynamoDB would be $300 a day for Uber. Should have gave up when it framed each DB evolution as a “promo project”
The breathless tone of this article is irritating. Was this a bad decision? Maybe. Is $8 million some vast amount that merits this sort of wide-eyed crazy ranting? Uber's 2020 revenue was around $11 billion, so I'm going to say no, not really. Obviously you don't want to burn millions willy-nilly, but for such a critical component, this isn't so terrible.
Firing people for bad architectural decisions is generally a terrible idea - especially decisions that shipped and ran in production for several years.
This article also doesn't make a convincing case for this being a huge mistake. Companies like Uber change their architectural decisions while they scale all the time. Provided it didn't kill the company stuff like this becomes part of the story of how they got to where they are.
Related: the classic line commonly attributed to original IBM CEO Thomas John Watson Sr:
“Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience?”
https://blog.4psa.com/quote-day-thomas-john-watson-sr-ibm/
Also the article doesn’t attempt to explore the business and resourcing constraints they were operating under at the time.
I have been in situations where I was told “don’t worry about cost just get it done”. Then a few years later the business constraints shift and now we need to “worry about the cost”. It ignores that decisions made under a different set of constraints were correct, or at least reasonable, at the time but things change.
One of my pet peeves is when people say “do it right the first time” but the definition of “right” often changes over time. If the only major flaw of this design was that it was expensive; then I am much more skeptical that it was wrong given the original set of conditions that they were operating under.
Yeah, this is exactly what I thought when I read this post. It seemed like the author either hasn't worked in big tech, or hasn't worked in the industry very long. It's extremely likely that the engineer who designed this was standing on his desk shouting "it's going to cost THIS MUCH MONEY. I want to make sure that EVERYONE IS OK WITH THIS." and was met with shrugs.
Here's how a big tech reporting chain sees this situation when everything is smooth sailing: "We're growing 3x year-over-year? After 2 years, the cost will be an order of magnitude higher no matter what solution we pick. The constant factor doesn't matter that much. But we have such an incredible roadmap that we will book more than an order of magnitude of revenue, backed by this new ledger project. The cost will always be a nonissue because of growth."
And then 2 years go by, and this incredible product growth adds a bunch of ledger entries that weren't there 2 years ago, someone nudges your reporting chain with the question, "this is pretty expensive.. what gives?" and then someone with a good combination of social and technical skills points out that a migration to your existing storage solution would be a cost effective way to continue growing.
At every step of the way, everyone is generally happy with what's going on.
Amen, right now I’m rewriting some code and parts of an application after running for years. So I have all the advantages of knowing the bugs and history.
There is zero chance anyone who wrote this the first time would do what I’m doing.
Some things I’m simplifying because it never becomes a spot that the previous devs thought would be a big pivot point for customization and heavy use….
Also totally possible that it was just an unpublished partnership of sorts between AWS and Uber. AWS wants the logo and a big case study implementation to give the product some credibility or a boost. Uber may not have been charged at all, may have even been paid to use AWS. The Uber developer may not have even known, just was given an edict to build it on dynamodb.
I think it's important for leadership to clearly define what right is in these cases, too, otherwise, you get as many ddefinitions of "right" as you have people, times, and places.
Easy to say, but it's a real human cost to relying on people to figure out what you mean rather than explaining what you mean. Not enough time is spent on cultivating effective communication and training. Everyone wants everything done yesterday and don't feel like investing in their own people.
Do you think that the social climbers who approved these obviously crappy projects learned anything?
I have worked with all levels of engineers who come into a project glassy eyed about some technology, sure, but if you are part of the team approving a project and you cant produce a realistic budget then your management is bogus as hell.
I have worked on a ton of these vanity projects, and when I voice my concerns its clear nobody is out to learn anything, they are here to look good and avoid looking bad, that's about it.
Get some articles published, go to some conferences, get a new job with a new title somewhere else, laugh on your way out.
> Do you think that the social climbers who approved these obviously crappy projects learned anything?
Just the framing of this question makes it seem like you simply don't like people in management / decision-makers, and you want something bad to happen to them. Maybe that's wrong, hopefully it is, but the rest of the comment doesn't do much to dissuade me of that impression either.
Cutting down anyone who gets a promotion or finds success is a culture in itself (see Tall Poppies Syndrome for example). Factual accuracy is not a concern, they only want to be angry at people in higher positions.
I don't think it's that managers or decision makers are bad, I think moreso it's that, for most companies, the criteria for promotion are absolutely busted. And, it creates a culture of self-preservation, which affects ICs, too.
What I mean is that people are selected for leadership based not off of their leadership ability, but rather their political ability and ambition. The reason we see increasingly delusionally confident people as we climb the corporate ladder is because the people promoting them are forced to make their decisions based off of small, distilled data.
So, basically, bullshitters rise to the top. It only makes sense given the constraints of the system. Metrics help, sure, but firstly those arent use too much for management promotions. And secondly, they can be gamed, and often are.
At the very tippy top you have c-suite, who are often so delusionally confident it borders on psychosis. After a certain point it just becomes lying, but the truth is that people like to hear good things. We just can't help it.
And, for self-preservation: most companies have an absolutely rotten, toxic, and even evil culture. For most companies, the majority of employees are focused on self-preservation. And nobody will say that out loud!
But when managers get into that self preservation mindset, it can get really ugly. It becomes lying, organization sabotage, fudging documents, in-fighting, etc to try to stay afloat. Especially as the organization appears to be less stable.
Something bad to happen to "them"? There's no diaphanous them, just the specific social climbing crap decision makers facing no consequences of any type.
I have worked with many hard working and caring managers, and they are generally eclipsed by said social climbers presenting at conferences every other week about know-nothing topics jumping from place to place leaving bankrupt companies and massive layoffs in their wake.
I see them posting on LI right now :)
Why are you thinking more about the people that piss you off than the ones that you consider hard working and caring?
You have a massive chip on your shoulder, dare I say that's why you've had many caring managers and now you're seeing them all as 'social climbers'.
Did one manager call you out on something and you torched the entire thing?
Are we reading the same comment? GP clearly separated the "caring managers" from the "LinkedIn corposlop ladder climbers", and even explicitly stated the issue with the latter is that they are usurping the former in moving up the ranks of the corporate hierarchy.
This isn't unique to GP either, it's not exactly uncommon nowadays for people to hate the corpo-techbro MBA LinkedIn archetype.
>There's no diaphanous them
Autocorrect mistake? I doubt anyone was imaging semi-transparent beings wafting gently in a summer breeze.
So what would you call your alternative to blameless postmortems? FWIW, "walking the plank" is already in use.
I was imagining it, as the people who are the ghostly images of the "them" out "there" that are often referred to when people are generally upset at authority or the system, that's not what I was trying to talk about.
I'd say the pirates had it right and keel hauling is the way to go.
I've certainly learned a great deal from my own crap glassy-eyed decisions throughout my career.
It probably was an unnecessary redesign that could have been avoided, but hey: at least it worked, and eight million dollars is not a huge amount for Uber.
Birmingham spent almost £150m for a system that didn't work at all:
https://www.theregister.com/2026/01/29/birmingham_oracle_lat...
While I was an undergraduate, my university also spent £9m on accounting that didn't work, also with Oracle: http://news.bbc.co.uk/1/hi/education/1634558.stm
If you've designed a system in house for your accounting, it works, makes neither financial nor software errors, is accepted by the users, and got away with it costing a relatively small fraction of your turnover? That's a big win.
ERP implementations probably don't fail for those kind of architectural reasons?
I agree. It is a lot of money, but that's the hope from paying engineers well: to make the chances of very expensive mistakes unlikely.
One thing I did think about was how this could have been architected without sufficient reference to costs, which might have been a process or structure improvement.
Right - if your engineering organization ships designs that are bad economically, the solution is to introduce a culture of predicting costs before committing to a design, and processes to help enforce that culture.
Add "expected budget, double-checked by at least one other principal engineer" to the project checklist.
Have the person most responsive for the $8m "mistake" be the person to drive that cultural change, since they now have the most credibility for why it's a useful step!
I went to school with a guy that dropped a $100k-200k VNA at Apple during an internship. He didn't get a full-time offer despite their investment :P
Letting interns carry six figure equipment, which would also be unexpectedly heavy especially if this happened some years ago, would be a weird thing for any lab I’ve worked in. There are too many things that can predictably go wrong in the hands of an inexperienced person, as happened here.
Interns wouldn’t even be allowed to use $100K VNAs without a lot of supervision because so many things can go wrong. Damaging one of those small precision connectors is easy to do and can be a costly repair that brings delays to the lab, and that’s before you even start making measurements.
I wonder if part of the offense was that the intern was breaking protocol by moving the equipment. Alternatively they probably failed to explain the rules and expectations to the intern. Or maybe some lazy engineer tried to pawn off their work on to an intern without thinking about the consequence.
I'm not sure - the level of scrutiny that usage/abusage of expensive equipment gets varies wildly from organisation to organisation. I've worked in some places where very expensive equipment is handled roughly, or even taken home in some cases. In others, there are meticulous procedures for even $1-5k pieces of equipment. It's just a cultural thing.
For this example it’s the delicacy and fragility of the instrument, the price is just a proxy for that.
Expensive VNAs are also precision, calibrated instruments with small connectors that can easily be degraded by even simple misuse. Frontends destroyed or subtly damaged in ways that break measurements by allowing the wrong signal to enter.
It’s easy to damage one in a way that will interfere with measurements for months before someone realizes what’s wrong, which is more costly than the VNA itself.
These instruments require training to handle. It’s not even about the price, it’s absurd that they’d let an intern carry one around at all (if it was allowed)
This is like the hardware equivalent of an intern accidentally dropping the production DB. My first question would be how they got to the point where an intern was in a position to be able to drop the production DB because everyone understands what can go wrong
The obvious answer is because VNAs are heavy and the person who would otherwise have to carry it isn't the person who has to pay for a replacement.
Fair enough. Fragility is probably more important than price in this scenario.
I cannot, of course, speak about this particular incident, but a person inclined to skip procedures expressly implemented to avoid the problem which occurred, or who ignores clear warnings that a problem is developing, is a liability, not a trained asset.
Meanwhile, in a sibling thread about an accounting mistake in California, everyone is screaming for blood.
Blame-free post-mortems are for me and mine, everyone else can get fucked.
> Firing people for bad architectural decisions is generally a terrible idea
I mean, if we're considering factors that could make fire a developer, suggesting, pushing and eventually failing to implement bad designs and architectures probably ranks among some of the more reasonable reasons for firing them. It doesn't seem to have been "Oops we used MariaDB when we should have used MySQL" but more like "We made a bad design decision, lets cover it up with another bad design decision" and repeat, at least judging by this part:
> So let me get this straight: DynamoDB was a bad choice because it was expensive, which is something you could have figured out in advance. You then decided to move everything to an internal data store that had been built for something else3, that was available when you decided to build on top of DynamoDB. And that internal data store wasn’t good on its own, so you had to build a streaming framework to complete the migration.
But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.
> But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.
And you just teached all your workers to be as cautious as being freezed, never be proactive, keep the status quo as much as they can, avoid being noticed, and never take a step without being forced or having someone else to take 100% blame (with paper trail) if things go south.
One of my favourite bosses ever was a VP who kept a bankers box at her desk and very few personal affects.
She told me she kept it there because her job was to make decisions and get fired or leave if she was wrong. She was right about so many of her choices, I would have followed her into anything. Then one day I came in and her desk was empty -- she had an apparently epic argument with the C suite and disagreed with their path so she left (never found out if that was a quit or fired). The team got a new VP, but I requested to be moved to a different team as I wasn't aligned with the new vision.
When you get to a certain level part of your job becomes owning the decisions and getting fired.
And in some workplaces, that actually is the way to go!
I once worked in a manufacturing environment where mistakes could be quite expensive. We had our annual org survey and one of the questions asked was "Risk taking is encouraged." Our team scored low on that metric, and upper management was concerned. They held a meeting to ask about it, and most of the team was confused why there was a meeting. They said they viewed it as a positive that they don't take risks.
I guess if that's your experience of letting toxic people go, maybe everyone you worked with was toxic? The usual reacttion I see from teams when firing people who seem to make a project/product worse instead of better, tends to be a sigh of relief and a communal feeling of "Lets get back to business".
Firing people making bad choices, people tend to appreciate that. Firing people making good choices? Yeah, I'd understand that would freeze people and make them avoid making proactive choices, try to not do that obviously.
No, he's right.
Remember you can conduct only one of the two different types of postmortem, the air crash style blameless one (to find out what happened) and the blame-based one (to find out who to punish). Once you conduct the latter, everyone psychologically "lawyers up". You get a lot more meetings. A lot more paper trail. A lot more delay. You don't just pick a database, you commission a sub-committee for database choice to review the available options over the next six months.
That's why government / civil service operations are so slow. They operate in a very high blame political environment.
Right, so say we have this situation where you're choosing a SQL database. The organization made a choice that leads to lots of complications, where often times the reason for the complication is because the organization made yet another bad choice. Repeat a couple of times.
We do a blameless postmortem about each one of these, where essentially we only focus on the root causes of the actual problems, but somehow it never comes up that there was one individual who made those bad choices over and over, which lead to the situations arising in the first place.
Do you just never address this? Do you continue to say "Well, it wasn't X's fault, it's the system around X that let X make that decision that needs fixing" even when it repeats, and the humans involved can already see what's going on?
In my mind you need to be able to address bad behavior in organizations where choices have an impact on something produced, otherwise we cannot change the quality what is being produced, or prevent production issues, since it's based on the choices we make, and if "we" make bad choices, the quality will be bad.
Ultimately I agree with you in more serious engineering-heavy domains, like airplanes and what not, and it's a sane default mode, to try to address what's happening around rather than decisions by individuals. But I also don't think that should mean that other domains aren't better served by some hybrid model, especially when it's about producing artifacts of some sort, and similar things.
>was one individual who made those bad choices over and over
This was never said, or even implied, in the article. We don't even know if this was a single person choice.
You are making up "facts" like calling the person who makes mistakes "toxic", or saying that the choice was made by someone who only made bad choices.
We are talking Uber here, in 2017, which was not only playing "move fast and break things" but "move really fast while shooting an AK47 blindfolded". Not only they expected mistakes, but they encouraged them. It would be plain wrong to start firing individual people for making mistakes if that is the environment.
> A redesign that gets replaced 2 years later is a catastrophe.
> Somebody Should Have Been Fired For This
This person is not a good resource. Uber was a very fast growing company, both in terms of their product and staff. Turnover in architecture happens. Calling this a catastrophe and click baiting about firing engineers over a rounding error in Uber’s overall finances is gross.
I understand this person is trying to grow their Substack with these inflammatory claims but I hope HN readers aren’t falling for it. This person’s takes are bad and they’re doing it to try to get you to become a subscriber. This is hindsight engineering from someone who wasn’t there.
I don’t know if the author here is intentionally clickbaiting or somehow ignorant about the scale Uber is operating at.
This was particularly egregious:
> If you’re building a system that makes the economics of your company impossible, you’re better off not building it.
If I’m understanding the timeline here, Uber replaced this system in 2019, and saved 8 million doing so. In 2019 their revenue was something like 13 billion dollars. In no world was this system making the economics of Uber impossible.
It’s a terrible article. They are angry on spending $687 a day for DDb writes when they company was doing billions in revenue. It’s silly and they have a grounded truth on how much systems cost at scale.
Uber does extensive leetcode interviews to weed out the best of the best developers just like FAANG. How could such a bad developer have been hired with this process in place and proceed to make such an expensive bad decision?
It’s almost as if gatekeeping the correct developers isn’t working. So yeah somebody should get fired. They screened to prevent this. It wasn’t prevented.
Otherwise admit the screen doesn’t work and fire the gatekeepers who are relying on it and the inverted selection of correct engineers.
> A redesign that gets replaced 2 years later is a catastrophe.
People forget how quickly Uber scaled, and the user impact of not being able to track your trips could be catastrophic to retention. There's a class of tech-influencer who think they can dissect past decisions on a blog post without being in the room when the technical constraints were being laid out. This is Monday morning quaterbacking at it's most grotesque.
> There's a class of tech-influencer who think they can dissect past decisions on a blog post without being in the room when the technical constraints were being laid out
The cost they are laying out are not that prohibitively expensive. I’ve known corporations where people spin up test clusters that cost 5K a month and forget about it. A business critical service can definitely ignore costs in the short term if they bring in customers. The standard practice is to just ship something quickly and optimize for the cost later if it helps bring in revenue/customers.
Besides, the napkin math isn’t always true. If you’re an enterprise customer for AWS, you get massive discounts, especially in the time frame they’re talking about. And when it comes to partnerships, I remember back in the day AWS used to let you do pretty much anything for free if it meant they could parade your project to other customers.
Yeah, I don't know the specific trade-offs here, but taking on some tech debt for two years of successful growth is not necessary a bad deal.
You can read the article and see it's not a tech debt trade off but someone not doing a back of the envelope guesstimate about how much DyanmoDB would cost to run their payments system on it.
This article does not make convincing arguments to match the strong criticism. Engineering decisions are difficult, especially in high-growth orgs, where one must balance many constraints and risks including opportunity cost. Handling payments is part of Uber's core product.
The financial criticism ('napkin math') appears to estimate DynamoDB costs of USD $8 million for 2017 to 2020. Uber revenue for the same period is roughly USD $42.5 billion, thus this cost weighs in at about 0.02%, or 1/50th of one percent. This is a rounding error for a high growth company, and not something that warrants a witch-hunt and firing. It's easy to blow more than $2 million per year on software engineers in pursuit of an alternative high-scalability solution.
I'm also not on board with the 'resume driven development' criticism as the explanation for solution churn. Perhaps that is actually what happened. I wasn't there and don't know, but if that is being asserted I expect to see evidence presented to support it.
Did the author even mention what they thought should have been used instead?
Can I do a mea culpa? This is more than 3 decades back. I was a junior programmer (2-3 years in industry) sent to a client site in europe. You can imagine the state of systems those days. I wrote (or rather updated) a fix which would updated the discount and tax rates on orders based on new terms. It would run every day to account for ... whatever. You pick the values from master file, update all the orders and move on.
I wiped out VAT on all orders and for the next month the paper invoices were sent without VAT. So the invoice is $100, VAT is $20, the invoice should be $120, but they were sent as $100.
100s of invoices every day would be my guess.
Nobody noticed.
For a month.
Millions of dollars of revenue and IIRC millions of dollars of VAT.
Until a customer complained to the CEO.
We had a firefight to fix it, not just technical but legal and managerial. We can send a new invoice just for tax. We can redo the invoice. We can send a debit memo. What is the right decision? But what if customers does not pay? What about returns? How will we track returns? Of course we were doing the technical solutions and the client company was front-ending how to handle it business wise.
And the managerial firefight - who did it, what are the safeguards in future? We had a company exec visit the client site to manage the issue.
I was in the hot seat but I was protected by my managers from any fallout. Just do the work. Do not screw up again. (Test every row every column even if you did not change it)
A month later the sales director at the client company got fired.
The grapevine is that this was just the tipping point, but you never know. BTW these were paper invoices printed onsite and mailed out, but I do not know if someone had the job to scrutinize them.
PS: True story, going by old memory, although such legends remain fresh in your mind, forever. Not sure it belongs here, but the mention of firing for a multi-million dollar mistake pulled this into cache memory.
If a single non-malicious code change can break a thing like that with nobody noticing that's a catastrophic failure in testing, QA, and operations (nobody noticed 20% of euros transacted just stopped). It's hard to blame that on IC engineer.
I know that, now. But apart from this incident being in the annals of time as it relates to my work, there absolutely was no fallout, which my mind could not comprehend in those days.
A single engineer should not get fired for an architectural decision that clearly had buy in from many people.
> Every rewrite was someone’s promotion project.
At least when I worked at Uber, that wasn't really how it worked. The eng org was so big that it was nearly impossible to track all the projects people worked on, and you'd get micro-ecosystems of tools because of it.
Some grew large, others stayed quite "local".
Alleged expert hawking some kind of blog/newsletter thing doesn't know that no major company is ever going to fire an employee for an honest mistake, especially one that would have had multiple sign offs from around the organization.
$8m is a lot of money for us working stiffs, but it's about 0.03% of their 2025 profits.
Hindsight is 20/20. Not saying they did the right thing, but they may have had specific performance reasons for originally going with DynamoDB.
I have a hard time following the thesis. I know Uber is famous for overbuilding in house but this example does not feel defensible. 2020 revenue was $11bn. 2024 revenue is $44bn. $8mm is not very material in the grand scheme. Could it be optimized? Maybe but I don’t know the full surface area and this article is overly aggressive with opinions. $250k a year just for writes sounds cheap to me when your top line is in the billions.
Who exactly is supposed to be fired for this?
If you don’t have price controls, it’s easy to run up a bill.
If no single person had the responsibility to check the cost, then no one actually failed at their assigned job. So you either fix the system or fire everyone involved in the decision.
What you’re doing now is looking a scapegoat to beat up. You’re angry and you’re going to make someone pay for pissing you off.
This seems like dramatically overstating the mistake. Yeah it was expensive, and yes this could easily been foreseen, but that’s really small potatoes compared to mistakes I’ve seen. I mean I’ve seen promos off shit that never even fully worked beyond pilot scale and had to be rolled back because it was fundamentally flawed on purely technical level.
This is an ad
And very clearly LLM written. It shouldn’t bother me as much as it does, but it does. And I know I do it too.
> Nobody Got Fired for Uber's $8 Million Ledger Mistake?
> Somebody Should Have Been Fired For This
> And nobody got fired for this?
Not once do they explain what firing someone would actually improve?!
$8M sounds like a lot, but (a) the cost of making a material financial mistake c an easily dwarf this, and (b) the cost of the engineers maintaining the system was likely about this expensive anyway. And infra is expensive when you're Uber. It all seems rather overblown to me.
Hate to say it but kind of a lousy article... zippy writing but lots of Monday Morning Quarterbacking for something the author doesn't seem to show much knowledge of. Maybe this is his style to gin up subscribers, but I'm not a fan.
> But nobody was optimizing for cost. They were optimizing for their next promotion. Each rewrite was a new proposal, a new design doc, a new system to put on a resume. The incentive was never to pick the boring, correct choice — it was to pick the complex, impressive one.
...I guess it could be possible nobody thought about cost at all, and this was all misaligned incentives and resume-driven development, but I find that kind of hard to believe? As someone who has made cost mistakes in the cloud, this claim seems a bit silly.
Not to detract from his experience, but I didn't actually see much payments experience at all on his resume, so I'm curious why he's branding himself as a payments guru. Kind of tech content creation fluff, I guess.
In general is there any practical way to fix the issue of "Every rewrite was someone's promotion project"? There doesn't seem to be any incentive for employees to care about projects long term. Keeping something running smoothly is never rewarded the same as launching something new or fixing something broken.
100% change the way American Tech companies treat their workforce?
Most people don't get meaningful raises at existing jobs so if they want raises, they must job hop or internally job switch.
Companies will layoff at drop of the hat so you have to make sure your skillset is up to date so you can get next job.
So everyone is launching big splashy projects so they put on their resume to protect themselves in case of layoffs or turn into a promotion.
Not really: it’s a lazy pejorative, in this case written by an LLM, not a description of reality. It’s honestly one of the stupider ideas that has caché, it seems to only survive by repetition.
Here, the tell is you’re not gonna get a multibillion dollar company on hockey stick growth to switch storage because you want to get promoted.
8M is kinda small potatoes - which is essentially the AWS business model. Sure you could build a cheaper thing but that is hard and this thing is right here, easy(ish) to use, and your company won't regret using it until you've moved into a different role.
What's an $8m mistake to a company like Uber? They made $9.8bn in profits in 2024.
It's helpful to rewrite software every 2 years as your team turns over. The code you understand best is code you wrote yourself, and no one likes maintaining other people's legacy code.
> With each trip generating multiple ledger entries, and Uber as a whole processing 15 million trips per day, it didn’t matter that DynamoDB was great because of high throughput at global scale. The proverbial bean counter should’ve stopped this madness from happening.
> At Uber’s scale, DynamoDB became expensive. Hence, we started keeping only 12 weeks of data (i.e., hot data) in DynamoDB and started using Uber’s blobstore, TerraBlob, for older data (i.e., cold data). TerraBlob is similar to AWS S3. For a long-term solution, we wanted to use LSG.
Honest question. Why do people go for this kind of complicated solution? Wouldn't Postgres work? Let's say each trip creates 10 ledger entries. Let's say those are 10 transactions. So 150 million transactions in a day. That's like 2000 TPS. Postgres can handle that, can't it?
If regional replication or global availability is the problem, I've to ask. Why does it matter? For something so critical like ledger, does it hurt to make the user wait a few 100 milliseconds if that means you can have a simple and robust ledger service?
I honestly want to know what others think about this.
I have been a part of over-built solutions.
It’s usually because executive management bakes hyper growth into the assumptions because they really want the biz to grow, then it becomes marching orders down the chain as it gets misinterpreted in a game of corporate telephone.
“We need to design this for 1b DAUs”
Then 1) that growth never happens and 2) you end up with a super complicated solution
Instead, someone needs to say, “Hey [boss], are you sure we need to build for 1b DAUs? Why don’t we build for 50m first, then make sure it’s extensible enough to keep improving with growth”
SRE here. Most of time we see choices like this because teams are under pressure to deliver and scale would likely exceed what a database will easily handle with the out of the box settings. So tweaking is required and that takes time/knowledge that Dev team doesn't have. AI helps a bit here but it didn't exist when DynamoDB solution was chosen. However, some terraform, and boom, scalable database created, only downside is the cost which is next Product Manager problem.
The author clearly hasn't worked for a big company. An $8m mistake? Meh, not great, but it probably seemed like a good idea at the time. Seen worse.
Oh, it was $8m over several years? That's a couple of projects that didn't pan out, or a small team that wasn't firing on all cylinders for a stretch.
Nobody got fired because there was nothing unusual to fire anyone for.
the author overestimates how much ~$5M/yr actually is. a business like uber isn't happy about that but it's not even in the top 10 of things they're wasting money on. moreover this isn't the engineer's sole fault it is more the fault of whoever actually approved the expense.
Submarine article.
Outside of that, it sounds like the system worked perfectly. They launched, they paid DB costs (the 8M was not a ledger mistake) and then they rebuilt after they wanted more cost savings. Also a bunch of folks got promoted.
The 8M came from VCs lighting money on fire. Honestly this seems like the system worked as planned to me, not a case study in how not to do things.
> A redesign that gets replaced 2 years later is a catastrophe
I mean, given how quickly things can change I think the language and sentiment here isn't quite right, it's just how businesses can change and we can't necessarily control that.
Spotify was another big proponent of DynamoDB, does anybody know how that went?
I thought Spotify was 100% on Google Cloud
Everything is a good idea until it isn’t. The entire industry was enamoured with microservices for far too long. We can look at these mistakes in hindsight and learn from them but we can’t judge them without the context of the time. Software was very different even just 10 years ago. $8m is a rounding error.
Imagine firing everyone that made a mistake, and not keeping the people that learned a valuable lesson?
This is horrible slop, and I gave it a long chance. Gave up after handwringing about how DynamoDB would be $300 a day for Uber. Should have gave up when it framed each DB evolution as a “promo project”
The breathless tone of this article is irritating. Was this a bad decision? Maybe. Is $8 million some vast amount that merits this sort of wide-eyed crazy ranting? Uber's 2020 revenue was around $11 billion, so I'm going to say no, not really. Obviously you don't want to burn millions willy-nilly, but for such a critical component, this isn't so terrible.