All of this article, both the good (critique of the status quo ante) and the bad (entirely too believing of LLM boosterism) are missing (or not stressing enough) the most important point, which is that the actual programming is not the hard part. Figuring out what exactly needs programmed is the hard part.
For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed. This is very difficult for the people who ask for the software, to understand, and it is quite often very difficult for the people doing the programming to understand.
Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like. Once you have arrived at that understanding, then there are a variety of ways to make what you need, but that is not the rate-limiting step.
> which is that the actual programming is not the hard part. Figuring out what exactly needs programmed is the hard part.
I’m growing tired of this aphorism because I’ve been in enough situations where it was not true.
Some times the programming part really is very hard even when it’s easy to know what needs to be built. I’ve worked on some projects where the business proposition was conceptually simple but the whole reason the business opportunity existed was that it was an extremely hard engineering problem.
I can see how one could go through a career where the programming itself is not that hard if you’re mostly connecting existing frameworks together and setting up all of the tests and CI infrastructure around it. I have also had jobs where none of the programming problems were all that complicated but we spent hundreds of hours dealing with all of the meetings, documents and debates surrounding every change. Those were not my favorite companies
But that is not programming then? Doing voice recognition in the 90s, missile guidance systems, you name it, those are hard things, but it's not the "programming" that's hard. It's the figuring out how to do it. The algorithms, the strategy, etc.
I might be misunderstanding, but I cannot see how programming itself can be challenging in any way. It's not trivial per se or quickly over, but I fail to see how it can be anything but mechanical in and of itself. This feels like "writing" as in grammar and typing is the hard part of writing a book.
Yep, I think people who repeat this aphorism essentially equate programming with typing, or as you say just connecting existing bits together. Programming is the working out how to get a computer to perform some task, not just the typing, it's the algorithms, the performance balancing, the structuring, the integration etc.
Imagine telling workers at a construction company that the hard problem was never building stuff but figuring out what needs to be built.
The saying also ignores the fact that humans are not perfect programmers, and they all vary in skills and motives. Being a programmer often not about simply writing new code but modifying existing code, and that can be incredibly challenging when that code is hairbrained or overly clever and the people who wrote it are long gone. That involves programming and it's really hard.
Isn't overly clever code a result of programmers doing simple things the hard mode?
Okay it's a spicy take, because juniors also tend to write too smart code.
Figuring out what to do and how to do it, is maybe not hard but it's effort. It's a hidden thing because it's not flat coding time, it requires planning, research, exploration and cooperation.
It's also true that some seemingly simple things are very hard. There are probably countless workarounds out there and the programmer wasn't even aware he is dodging an NP hard bullet.
Both arguments are valid.
I think the weight leans on effort, because effort is harder to avoid. Work, complexity, cruft piles up, no matter what you do. But you can work around hard problems. Not always but often enough. Not every business is NASA and has to do everything right, a 90% solution still generates 90% returns, and no one dies.
> Imagine telling workers at a construction company that the hard problem was never building stuff but figuring out what needs to be built.
Isn't this kind of true, though? Housing construction, for instance, isn't bottlenecked by the technical difficulties of building, but by political and regulatory hurdles. Or look at large, capital-intensive projects such as the always-proposed, never built new Hudson river train tubes. Actually building these will take billions of dollars and many years, but even they would be long built by now were it not for their constantly being blocked by political jockeying.
Building stuff _does_ often involve difficult technical challenges, but I still think that as a general aphorism the observation that this isn't the _hardest_ part holds true.
We might have different concepts of "hard", but if I were a construction worker I think I would agree. Hell, I'm a developer and I agree. Figuring out what to do definitely is the hard part. The rest is work and can be sweaty, but it's not hard as if it's full of impenetrable math or requiring undiscovered physics. It's just time-consuming and in the case of construction work physically tiring.
It might be that I have been doing this for too long and no longer see it.
This concept exists outside of engineering too. It's captured in the more negatively intentioned: ““The best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer". In user research, it's a much better signal when people correct you than when they agree. Politeness is easy—especially under the circumstances (power dynamic of you paying them, they only half care about your work, people generally want to be nice/agreeable, etc.)—such that you should be weary of it. Similarly trying to get real project goals or real requirements or real intentions from a PM or a boss, who may well be hiding that they there isn't much vision underneath things, is the same. The problem is that as productive as it is for developing the team's thinking, it will (1) probably come off as unproductive and challenging because you're slowing "progress" and (2) saying dumb wrong things makes you seem dumb and wrong. But per the concept, even when you do have the foresight to question, you're not allowed to just ask.
+1 A huge amount of software - probably most - is not actually generating value and in many cases is actually reducing value.
I've seen teams build and re-build the same infrastructure over and over.
I saw a request that could have been met with a few SQL queries and a dashboard got turned into a huge endeavor that implements parts of an ETL, Configuration Management, CI/CD, and Ticketing system and is now in the critical path of all requests all because people didn't ask the right questions and the incentive in a large organization is to build a mini-empire to reign over.
That said, smart infrastructure investment absolutely can be a competitive advantage. Google's infrastructure is, IMO, a competitive advantage. The amount of vertical integration and scale is unparalleled.
One of the most confusing moments in my early career was when someone spent two whole quarters building a custom tool that did something a mature and well respected open source project did for us. There was no advantage to his tool and he would admit it when cornered by the question.
We all thought he would get reprimanded for wasting so much time, but by the time management figured out was happening they decided they needed to sell it as a very important idea rather than admit they just spent $100,000 of engineering time on something nobody needed. So it turned into something to celebrate and we were supposed to find ways to use it.
That company went down in flames about a year later. That’s how I learned one way to spot broken organizations and get out early rather than going down with the ship.
Incentive of undemocratic groups is to build mini-empires yes, but if the business decisions were led by workers instead of a group of tyrants it'll most likely be a better decision. If we want lived examples of this, look at recorded history.
Hard perhaps but it feels a lot easier now than three years ago. Or so my backlog of personal projects outside of my most familiar stack would suggest.
What is hard about it? Young children seem to pick it up with ease. It cannot be that hard?
Determining what to program can be hard, but that was already considered earlier.
The only other place where I sometimes see it become hard for some people is where they treat programming as an art and are always going down crazy rabbit holes to chase their artistic vision. Although I would say that isn't so much that programming is hard, but rather art that is trying to push boundaries is hard. That is something that holds regardless of the artistic medium.
> What is hard about it? Young children seem to pick it up with ease. It cannot be that hard?
That's like saying "becoming a writer can't be that hard, since kids learn how to write in the elementary school".
Given a set of requirements, there are many different ways to write a program to satisfy them. Some of those programs will be more efficient than others. Some will scale better. Some will end up having subtle bugs that are hard to reproduce.
> That's like saying "becoming a writer can't be that hard, since kids learn how to write in the elementary school".
Is writing hard? I expect most can agree that determining what to write, especially if you have an objective (e.g. becoming a best-selling novelist), can be extremely hard — but writing itself?
> there are many different ways to write a program to satisfy them.
"What to program" being hard was accepted from the onset and so far we see no disagreement with that.
> Is writing hard? I expect most can agree that determining what to write, especially if you have an objective (e.g. becoming a best-selling novelist), can be extremely hard — but writing itself?
Being able to transcribe sentences in a certain language is the skill kids pick up in elementary schools. Being a writer requires a whole set of skills built on top of that.
The reason why I brought up that difference in the first place is because both of these are called "writing". When a fan says "I heard the author is writing the next book in the series" or when an author says "I haven't been able to focus on writing due to my health issues", they're not talking about the low-level transcription skill.
> "What to program" being hard was accepted from the onset and so far we see no disagreement with that.
Similar to your interpretation of "writing", you're choosing to interpret "programming" as a process of transcribing an algorithm into a certain programming language, and everything else ends up being defined as "what to program".
That's an overly reductive interpretation, given the original context:
> For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed.
> [...]
> Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like.
Notice that the original comment talks defines "determining what to program" as a process of refining your understanding of the problem itself.
In my reading of the original comment, understanding what your users need is "what to program". Writing code that solves your users' requirements is "programming".
They don't? It is taught in schools in the early elementary level. I see no indication that most are failing.
I think we can agree that few of them would be economically useful due to not knowing what to program. There is no sign of competency on that front. Certainly, even the best programmer in the world could theoretically be economically useless. Programmers only become economically useful when they can bridge "what to program".
> They don't? It is taught in schools in the early elementary level. I see no indication that most are failing.
Programming in elementary schools typically involves moving a turtle around on the screen. (My mother taught 4th grade in New York for many years, and I believe her when she explained the computer instruction.)
Economically valueable programming is much more complex than is taught in many schools through freshman college. (I taught programming at the college level from 1980 till I retired in 2020.)
Because economically valuable programming has to consider what to program, not simply follow the instructions handed down by a teacher of exactly where and how to move a turtle on the screen. But nobody questions "what to program" not being hard. It was explicitly asserted in the very first comment on this topic as being hard and that has also carried in the comments that have followed.
For whatever reason 1/10 engineers seem to be able to bring an idea from start to finish by themselves. I don’t know that it’s technical skill, but something difficult is going on there.
This is true when fresh college grads are building stuff. Experienced engineers know how to build things much more efficiently.
Also people like to fantasize that their project, their API, their little corner of the codebase is special and requires special treatment. And that you simply cant copy the design of someone much more experienced who has already solved the problem 10 years ago. In fact many devs boast about how they solved (resolved) that complex problem.
In other domains - Professional engineers (non-swe) know that there is no shame in simply copying the design for a bridge that is still standing after all those years.
> All of this article, both the good (critique of the status quo ante) and the bad (entirely too believing of LLM boosterism) are missing (or not stressing enough) the most important point, which is that the actual programming is not the hard part. Figuring out what exactly needs programmed is the hard part.
HARD AGREE. But…
Taken as just such, one might conclude that we should spend less time writing software and more time in design or planning or requirement gathering or spec generating.
What I’ve learned is that the painful process of discovery usually requires a large contribution of doing.
A wise early mentor in my career told me “it usually takes around three times to get it right”. I’ve always taken that as “get failing” and “be willing to burn the disk packs” [https://wiki.c2.com/?BurnTheDiskpacks]
While it's true that 'figuring out what exactly needs to be programmed' was always the hard part. It's not the part that the most money was spent on. Actually programming the thing always took up the most time and money.
True enough, but I think that a lot of "actually programming the thing" turned out to be "figuring out what exactly needs to be programmed". Afterwards, people did not want to admit that this was the case, perhaps even to themselves, because it seemed like a failure to plan. However, in most (nearly all?) cases, spending more time prior to programming would not have resulted in a better result. Usually, the best way to figure out what needs to be programmed, is to start doing it, and occasionally take a step back to evaluate what you've learned about the problem space and how that changes what you want to actually program.
In other words "figuring out what needs to be programmed" and "actually programming the thing" look the same while they're happening. Afterwards, one could say that the first 90% was figuring out, and only the last 10% was actually doing it. The reason the distinction matters, is that if you do something that makes programming happen faster, but figuring out happen slower, then it can have the surprising affect of making it take longer to get the whole thing done.
> Usually, the best way to figure out what needs to be programmed, is to start doing it, and occasionally take a step back to evaluate what you've learned about the problem space and how that changes what you want to actually program.
Replace the verb "program" with "do" or anything else, and you've got a profound universal philosophical insight right there
I'm curious how this would work with LLMs increasing the speed to prototype. Low stakes changes to try something out, learn from it, and pivot.
My company is fully remote so all meetings are virtual and can be set to have transcripts, parsing through that for the changes needed and trying it out can be a simple as copy-paste, plan, verify, execute, and distribute.
> Actually programming the thing always took up the most time and money.
I'm curious is any quantitative research has been done comparing time writing code vs time gathering and understanding requirements, documenting, coordinating efforts across developers, design and architecture, etc.
The claim is that most software teams do not consider the financial impact of their work. Is what they are doing producing value that can be measured in dollars and cents and is greater than the cost of their combined cost of employment?
The article suggests that there is a lot of programming being done without considering what exactly needs to be programmed.
> The article suggests that there is a lot of programming being done without considering what exactly needs to be programmed.
And the parent rightfully points out that you cannot know exactly what needs to be programmed until after you've done it and have measured the outcome. We literally call the process development; for good reason. Software is built on hunches and necessarily so. There is an assumption that in the future the cost of the work will pay back in spades, but until you arrive in that future, who knows? Hence why businesses focus on metrics that try to observe progress towards finding out rather than tracking immediate economic payoff.
The interesting takeaway from the article, if you haven't give this topic much thought already, is that the changing financial landscape means that businesses are going to be more hesitant to take those risks. Right now there still seems to be enough optimism in AI payoffs to keep things relatively alive, but if that runs out of steam...
Agreed, but are you also implying that the process of iteratively "programming something that's not it, and then replacing it" multiple times is not in the scope of what LLMs can/will do?
Most of the time taken during this process is spent getting feedback, processing it, and learning that it's not it. So even if LLMs drive the build time to zero, they won't speed up the process very much at all. Think 10% improvement not 10x improvement.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Then I'd wager it's the same for the courses and workshop this guy is selling...an LLM can probably give me at least 75% of the financial insights for not even .1% of what this "agile coach" is asking for his workshops and courses.
Maybe the "agile coach LLM" can explain to the "coding LLM's" why they're too expensive, and then the "coding LLM's" can tell the "agile coach LLM" to take the next standby shift then, if he knows so much about code?
And then we actual humans can have a day off and relax at the pool.
Ceding the premise that the AGI is gonna eat my job, my job involves reading the spec to be able verify the code and output so the there’s a human to fire and sue. There are five layers of fluffy management and corporate BS before we get to that part, and the AGI is more competent at those fungible skills.
With the annoying process people out of the picture, even reviewing vibeslop full time sounds kinda nice… Feet up, warm coffee, just me and my agents so I can swear whenever I need to. No meetings, no problems.
There’s gonna be one guy in charge of you, and he’s going to expect you to be putting out 20x output while thanking him for the privilege of being employed, assuming all goes the way every management team seems to want
I dont think this will happen because AI has become a straight up cult and things that are going well don’t need so many people performatively telling each other how well things are going.
If a SWE could truly output 20x their effort, that person would probably be better at freelancing or teaming up with another SWE. If something can be automated away to AI is Project Management. Also, there has to be a point where delivering more and faster code doesn’t matter, because the choke points are somewhere else in the Project Life Cycle, say waiting for legal, other vendors, budgets, suppliers, etc, so productivity could max out at say 3X, after which, unless you have a strong pipeline of work, your engineers will be sitting around waiting for the next phase of the project to start.
> There’s gonna be one guy in charge of you, and he’s going to expect you to be putting out 20x output while thanking him for the privilege of being employed, assuming all goes the way every management team seems to want
To add to this, I remember somebody here on HN pointing out a few months ago that they’ve never seen so much investment in businesses that are going “we don’t actually know what the billion dollar application is so we’re going to sell y’all some rough tools and bank on the rest of you figuring it out for us.”
That's the difference between programming and software engineering.
A software engineer should be able to talk directly to customers to capture requirements, turn that into spec sheet, create an estimate and a bunch of work items, write the whole system (or involve other developers/engineers/programmers to woek on their work items), and finally be able to verify and test the whole system.
That entire role is software engineering. Many in the industry suck at most of the parts and only like the programming part.
I think the hardest part is requirements gathering (e.g. creating organized and detailed notes) and offloading work planned work to other developers in a neat way, generally speaking, based on what I see. In other words, human friction areas.
> That entire role is software engineering. Many in the industry suck at most of the parts and only like the programming part.
I'm always amused when I read anecdotes from a role siloed / heavily staffed tech orgs with all these various roles.
I've never had a spec handed to me in my career. My job has always been been end to end. Talk to users -> write spec into a ticket -> do the ticket -> test the feature -> document the feature -> deploy the feature -> support the feature in production from on-call rotation.
Often I have a few juniors or consultants working for me that I oversee doing parts of the implementation, but thats about it.
The talking to users part is where a lot of people fall down. It is not simply stenography. Remember most users are not domain/technical experts in the same things as you, and it's all just a negotiation.
It's teasing out what people actually want (cars vs faster horses), thinking on your feet fast enough to express tradeoffs (lots of cargo space vs fuel efficiency vs seating capacity vs acceleration) and finding the right cost/benefit balance on requirements (you said the car needs to go 1000 miles per tank but your commute is 30 miles.. what if..).
> I've never had a spec handed to me in my career.
We call those places "feature factories".
I have been required to talk with many in my life, I have never seen one add value to anything. (There are obvious reasons for that.) But yet, the dominant schools in management and law insist they are the correct way to create software, so they are the most common kind of employment position worldwide.
Careful with that though. The guy whose entire job is to "take requirements from the customers and bring them to the engineers" really does get awful tetchy if the engineers start presuming to fill his role. Ask me how I know.
qa has long ago merged with programming in "unified engineering". Also with SRE ("devops") and now the trend is to merge with CSE and product management too ("product mindset", forward-deployed engineers). So yeah, pretty much, that's the trend. What would you trust more - an engineer doing project management too - or a project manager doing the engineering job?
The PMs and QAs I know would disagree with that assessment.
> What would you trust more - an engineer doing project management too - or a project manager doing the engineering job?
If one of the three, {PM, QA, coder}, was replaced by AI, as a customer I'd prefer to pick the team missing the coder. But for teams replacing two roles with AI, I'd rather keep the coder.
But a deeper problem now is, as a customer, perhaps I can skip the team entirely and do it all myself? That way, no game of telephone from me to the PM to the coder and QA and back to me saying "no" and having another expensive sprint.
If I'm managing a company of about 10 people to do something in the physical world, I'd probably skip the PM & QA and hire the engineer and have the engineer task the LLM with QA given a clear set of requirements and then manage the projects given a clear set of deadlines.A good SE can do a "good enough" job at QA and PM in a small company that you won't notice the PM & QA is missing. But the PM & QA can always be added or QA can be augmented with a specialist assuming you're LLM-driven.
Of course if none of your software projects are business-critical to the degree that downtime costs money pretty directly then you can skip it all and just manage it yourself.
The other thing you should probably understand is that the feedback cycle for an LLM is so fast that you don't need to think of it in terms of sprints or "development cycles" since in many cases if you're iterating on something your work to acceptance test what you're getting is actually the long pole, especially if you're multitasking.
> If one of the three, {PM, QA, coder}, was replaced by AI, as a customer I'd prefer to pick the team missing the coder.
I am curious: why? In all my years of career I've seen engineers take on extra responsibilities and doing anywhere from decent to fantastic job at it, while people who tend to start much more specialized (like QA / sysadmins / managers) I have historically observed struggling more -- obviously there are many and talented exceptions, they just never were the majority, is my anecdotal evidence.
In many situations I'd bet on the engineer becoming a T-shaped employee (wide area of surface-to-decent level of skills + a few where deep expertise exists).
> The PMs and QAs I know would disagree with that assessment.
It just depends on the org structure and what the org calls different skills. In lots of places now PM (as in project, not product) is in no way a leadership role.
QA is still alive and well in many companies, including manual QA. I'm sure there's a wide range these days based on industry and scale, but you simply don't ship certain products without humans manually testing it against specs, especially if its a highly regulated industry.
I also wouldn't be so sure that programming is the hardest of the three roles for someone to learn. Each role requires a different skill set, and plenty of people will naturally be better at or more drawn to only one of those.
From my experience with modern software and services, the actual practice of QA has plainly atrophied.
In my first gig (~30 years ago), QA could hold up a release even if our CTO and President were breathing down their necks, and every SDE bug-hunted hard throughout the programs.
Now QA (if they even exist) are forced to punt thousands of issues and live with inertial debt. Devs are hostile to QA and reject responsibility constantly.
Back to the OP, these things aren't calculable, but they'll kill businesses every time.
that's not the role of QA to be a gatekeeper, they give the CTO and President information on the bugs and testing but it's a business decision to ship or not
I’m not a native English speaker, but isn’t gatekeeping exactly that? Blocking suspicious entities unless they’re allowed through by someone higher in the hierarchy?
Maybe it's different where you live but QA pretty much disappeared a few years ago and project managers never had anything to do with the actual software
True. And yet, as an organization when you buy OP's training, you don't buy the material. You buy the feeling that you make your organization becomes more productive. You buy the signal to your boss that you are innovative and working to make your organization more productive. And you buy the time and headspace from your engineers that they are thinking if at least for 2 hours about making the organization more productive. The latter can be well worth the cost, and the former surely too.
They're buying a defensible (or laudable) justification when the training company's fee appears as a line item in the company budget.
This doesnt mean the training has to be good, useful or original in the slightest but the provider does need to have credentials which arent just "some dev with a hot take" that a fellow executive would recognize.
> A messy codebase is still cheaper to send ten agents through than to staff a team around
People who say that haven't used today's agents enough or haven't looked closely at what they produce. The code they write isn't messy at all. It's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. The entire construction is just wrong, hidden behind a nice exterior. And when you need to add a couple more floors, the agents can't "get through it" and neither can people. The codebase is bricked.
Today's agents are simply not capable enough - without very close and labour-intensive human supervision - to produce code that can last through evolution over any substantial period of time.
They can work really well if you put sufficient upfront engineering into your architecture and it's guardrails, such that agents (nor humans) basically can't produce incorrect code in the codebase. If you just let them rip without that, then they require very heavy baby-sitting. With that, they're a serious force-multiplier.
They just make a lot of mistakes that compound and they don't identify. They currently need to be very closely supervised if you want the codebase to continue to evolve for any significant amount of time. They do work well when you detect their mistakes and tell them to revert.
Debugging would suffer as well, I assume. There's this old adage that if you write the cleverest code you can, you won't be clever enough to debug it.
There's nothing really stopping agents from writing the cleverest code they can. So my question is, when production goes down, who's debugging it? You don't have 10 days.
The problem is, the MBAs running the ship are convinced AI will solve all that with more datacenters. The fact that they talk about gigawatts of compute tells you how delusional they are. Further, the collateral damage this delusion will occur as these models sigmoid their way into agents, and harnesses and expert models and fine tuned derivatives, and cascading manifold intelligent word salad excercises shouldn't be under concerned.
First, it's not "can occur" but does occur 100% of the time. Second, sure, it does mean something is missing, but how do you test for "this codebase can withstand at least two years of evolution"?
You have to fight to get agents to write tests in my experience. It can be done, but they don't. I've yet to figure out how get any any agent to use TDD - that is write a test and then verify it fails - once in a while I can get it to write one test that way, but it then writes far more code to make it pass than the test justifies and so is still missing coverage of important edge cases.
I have TDD flow working as a part of my tasks structuring and then task completion.
There are separate tasks for making the tests and for implementing. The agent which implements is told to pick up only the first available task, which will be “write tests task”, it reliably does so. I just needed to add how it should mark tests as skipped because it’s been conflicting with quality gates.
You can spend a lot of time perfecting the test suite to meet your specific requirements and needs, but I think that would take quite a while, and at that point, why not just write the code yourself? I think the most viable approach of today's AI is still to let it code and steer it when it makes a decision you don't like, as it goes along.
A lot of that can be overcome by including the need to be able to put more floors on top as part of the spec. Whether it be humans or agents, people rarely specify that one explicitly but treat it as an assumed bit of knowledge.
It goes the other way quite often with people. How often do you see K8s for small projects?
> A lot of that can be overcome by including the need to be able to put more floors on top as part of the spec
I wish it could, but in practice, today's agents just can't do that. About once a week I reach some architectural bifurcation where one path is stable and the other leads to an inevitable total-loss catastrophe from which the codebase will not recover. The agent's success rate (I mostly use Codex with gpt5.4) is about 50-50. No matter what you explain to them, they just make catastrophic mistakes far too often.
It isn't. Anthropic tried building a fairly simple piece of software (a C compiler) with a full spec, thousands of human-written tests, and a reference implementation - all of which were made available to the agent and the model trained on. It's hard to imagine a better tested, better-specified project, and we're talking about 20KLOC. Their agents worked for two weeks and produced a 100KLOC codebase that was unsalvageable - any fix to one thing broke another [1]. Again, their attempt was to write software that's smaller, better tested, and better specified than virtually any piece of real software and the agents still failed.
Today's agents are simply not capable enough to write evolvable software without close supervision to save them from the catastrophic mistakes they make on their own with alarming frequency.
Specifically, if you look at agent-generated code, it is typically highly defensive, even against bugs in its own code. It establishes an invariant and then writes a contingency in case the invariant doesn't hold. I once asked it to maintain some data structure so that it could avoid a costly loop. It did, but in the same round it added a contingency (that uses the expensive loop) in the code that consumes the data structure in case it maintained it incorrectly.
This makes it very hard for both humans and the agent to find later bugs and know what the invariants are. How do you test for that? You may think you can spec against that, but you can't, because these are code-level invariants, not behavioural invariants. The best you can do is ask the agent to document every code-level invariant it establishes and rely on it. That can work for a while, but after some time there's just too much, and the agent starts ignoring the instructions.
I think that people who believe that agents produce fine-but-messy code without close supervision either don't carefully review the code or abandon the project before it collapses. There's no way people who use agents a lot and supervise them closely believe they can just work on their own.
Lol I largely agree with my beloved dissenters, just not on the same magnitude. I understand complete specs are impossible and equivalent to source code via declaration. My disagreement is with this particular part:
"t's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. "
If your test/design of a BUILDING doesn't include at simulations/approximations of such easy to catch structural flaws, its just bad engineering. Which rhymes a lot with the people that hate AI. By and large, they just don't use it well.
"Incomplete specs" is the way of the world. Even highly engineered projects like buildings have "incomplete specs" because the world is unpredictable and you simply cannot anticipate everything that might come up.
And sometimes it can't even handle it then. I was recently porting ruby web code to python. Agents were simultaneously surprisingly good (converting ActiveRecord to sqlalchemy ORM) and shockingly, incapably bad.
For example, ruby uses blocks a lot. Ruby blocks are curious little thingies because they are arguably just syntax sugar for a HOF, but man it's great syntax sugar. Python then has "yield" which is simultaneously the same keyword ruby uses for blocks, but works fundamentally differently (instead of just a HOF, it's for generating an iterator/generator) and while there are some decorators that can use yield's ability to "pause" execution in the function to send control flow back out of the function for a moment (@contextmanager) which feels _even more_ like ruby blocks, it's a rather limited trick and requires the decorator to adapt the Generator to a context manager and there's just no good way to generalize that.
Somehow this is the perfect storm to make LLMs completely incapable of converting ruby code that uses blocks for more than the basic iteration used in the stdlib. It will try to port to python code that is either nonsensical, or uses yield incorrectly and doesn't actually work (and in a way that type checkers can even spot). And furthermore, even if you can technically whack it with a hammer until it works with yield, it's often not at all the way to do it. Ruby devs use blocks not-uncommonly while python devs are not really going to be using yield often at all, perhaps outside of @contextmanager. So the right move is usually to just restructure control flow to not need to use blocks/HOFs (or double down and explicitly pass in a function). (Rubyists will cringe at this, and rightly so... Ruby is often extraordinarily expressive).
The fact that such a simple language feature trips them up so completely is pretty odd to me. I guess maybe their training data doesn't include a lot of ruby-to-python conversions. Maybe that's indicative of something, but I digress.
> A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today.
I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
Same here. I have now deleted 43k and counting lines of my codebase. There is no point in putting any AI code into production anymore as it almost always uses none or the wrong abstractions.
When you try to throw more agents at the problem or even more verification layer, you just kill your agility even if they would still be able to work
>I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
This rhymes a lot with the Mythical Man Month. There's some corollary Mythical Machine Month thing going on with agent developed code at the moment.
The more I work with AIs (I build AI harnessing tools), the more I see similarities between the common attention failures that humans make. I forgot this one thing and it fucks everything up, or you just told me but I have too much in my mind as context that I forget that piece, or even in the case of Claude last night attesting to me while I am ordering it around that it cannot SSH into another server but I find it SSHing into said server about the 5th time I come back with traceback and it just fixes it!
All of these things human do, and i don't think we can attribute it directly to language itself, its attention and context and we both have the same issues.
Right, but when humans are writing the code, they have learned to focus on putting downward pressure on the complexity of the system to help mitigate this effect. I don't get the sense that agents have gotten there yet.
> Potentially, yes, but as with other software, you need to know AND have (automated) verifications on what it does, exactly.
Yes, but even here one needs some oversight.
My experiments with Codex (on Extra High, even) was that a non-zero percentage of the "tests" involved opening the source code (not running it, opening it) and regexing for a bunch of substrings.
I'm wondering how much value there is in a rewrite once you factor in that no one understands the new implementation as well as the old one.
Not only is it difficult to verify, but also the knowledge your team had of your messy codebase is now mostly gone. I would argue there is value in knowing your codebase and that you can't have the same level of understanding with AI generated code vs yours.
The point of a rewrite is to safely delete most of that arcane knowledge required to operate the old system, by reducing the operational complexity of it.
I was involved in a big re-write years ago. The boss finally put the old product on his desk with a sign "[boss's name]'s product owner" - that is when people asked how should this work the most common answer was exactly like the old version. 10 years latter the rewrite is a success, but it cost over a billion dollars. I have long suspected that billion dollars could have been better spend by just fixing technical debt.
> “Now we know what we were trying to build - let’s do it properly this time!”
I wonder if AI will avoid the inevitable pitfalls their human predecessors make in thinking "if I could just rewrite from scratch I'd make a much better version" (only to make a new set of poorly understood trade offs until the real world highlights them aggressively)
That's correct, the more I work with AI the more it's obvious that all the good practice for humans is also beneficial for AI.
More modular code, strong typing, good documentation... Humans are bad at keeping too much in the short-term memory, and AI is even worse with their limited context window.
> Software development is one of the most capital-intensive activities a modern company undertakes
The article is definitely written from a "high tech" industry lens. A mid-sized utility might spend $80-$150 million USD on IT capital projects in a year, but $2b on power pole maintenance. Utilities are a strong example, but any large enterprise manufacturing company is spending more on factory upgrades that programming.
> [...] built a functional replica of approximately 95% of Slack’s core product in fourteen days using LLM agents.
IT and Finance leadership and asset heavy companies are currently trying to wrap their head around the current economics of their 100+ SaaS contracts, and if it still makes sense with LLM powered developers. Can they hire developers in house to build the fraction of the tool they use from many of these companies, save on total cost and Opex?
I work with these companies a lot, and won't weigh in on the right decision. Bottom line "it depends" on many factors, some of which are not immediately obvious. The article still holds weight regardless of industries, but there is some nuance (talent availability, internal change cost, etc.) that also have to be considered.
Yeah, that line came across as a little out of touch. I work for US DOTs, and a yearly allotment from a STIP of a small DOT is still measured in billions. Software spend is negligible. In fact, I would say software was always costly in terms of labor, but hasn’t been capital intensive until recently.
But I would like to agree with what you said with respect to SaaS spending coming under scrutiny. Our technical experts are becoming aware that we spend 5 or 6-figure sums on software with barely any users that we can clone with a coding agent in an afternoon. Eventually management will find out too and we’re going to cut a lot of dead weight.
A modern pharmaceutical manufacturing plant costs two-billion dollars just to build, and that doesn't include developing a drug to actually manufacture there, or a distribution network to sell what you make inside it.
I thought it was a good article, till I saw the Slack example.
The copy doesn’t even remotely grasp the scale of what the actual Slack sofware does in terms of scale, relaiability, observability, monitorability, maintability and pretty sure also functionality.
Author only writes about the non-dev work as difference, which seems like he doesn’t know what he’s talking about in all, and what running an application at that scale actually means.
This "clone" doesn’t get you any closer to an actualy Slack copy than a white piece of paper
I had the same experience (though I agree with other comments that the numbers are a little optimistic in terms of variance; I think there's a huge amount of variance in product work, you can't know what's a good investment until it's too late, many companies fail because of this, and there's huge survivorship bias in the ones that get lucky and don't initially fail). Slack spent tons of money in terms of product and engineering hours finding out what works and what doesn't. It's easy to copy/paste the thing after all that effort. Copy/paste doesn't get you to the next Slack though--it can get you to Microsoft's Slack-killing Teams strategy, but we obviously don't want more of that. And, obviously I agree with you about all the infra/maintenance costs, costs in stewarding API usage and extensions, etc. LLMs won't do any of that for you.
Students in the 2010s were building twitter clones as part of third-year college courses.
And somehow twitter survived and thrived and didn't really get viable competitors until forces external to the code and product itself motivated other investment. And even then it still rolls on, challenged these days, but not by the ease of which a "clone" can be made.
Yeah, I can build a Slack "clone" in a couple of weeks with my own two hands, no AI required. But it's not going to actually be competitive with Slack.
Just to pick an incredibly, unbelievably basic enterprise feature, my two-week Slack clone is not going to properly support legal holds. This requires having a hard override for all deletion and expiration options anywhere in the product, that must work reliably, in order to avoid accidental destruction of evidence during litigation, which comes with potentially catastrophic penalties. If you don't get this right, you don't sell to large corporations.
And there are hundred other features like this. Engineering wants an easy-to-use API for Slack bots. Users want reaction GIFs. You need mobile apps. You need Single Sign-On. And so on. These are all table stakes.
It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.
Believe me, I wish that "simple, clean" reimplementations were actually directly competitive with major products. That version of our industry would be more fun. But anyone who thinks that an LLM can quickly reimplement Slack is an utter fool who has never seriously tried to sell software to actual customers.
> It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.
The other issue is that yes, perhaps most users only use 20% of the features, but each user uses a different 20% of the features in products like Word. Trust me, it's super hard to get it right even at the end-user level, let alone the enterprise level like you say.
There are at most 5% of the features of word that are common to everyone. Things like spell check everyone uses. Actually I suspect it is more like 0.1% of the features are common, and most people use about 0.3% of the features and power users get up to 5% of the features - but I don't have data, just a guess.
Yeah but 98% of Word features were buried in like 2004. They were added when it was a selling point to use unicorn and gnome icons as your table border in under 100mb of RAM. So we’re talking about 20% of the limited set of features that remain not just for backwards compatibility.
And there's some company out there that has very important Word documents that will fail to open if you take away the unicorn and gnome icons table border feature.
When I look at the big non-tech industry companies that have a chill life and print money. It’s usually the companies that are just the very best in what they do and have a quasi monopoly or so much competitive andvantage that everybody is just using them.
I'm not commenting too much on the details of the article, but the premise does resonate with me. I would argue all the engineering teams I've been on do not spend enough time thinking about how much a piece of work will cost to execute, and whether it will generate a return.
I suspect this is most apparent on things like meeting culture. Something happens and all of a sudden there is another recurring meeting on the calendar, with 15 attendee's, costing x dollars in wages, that produces no value for the customers because the lesson was already learned.
Or when reacting to an incident of some sort, it's so easy to have a long list of action items that may theoretically improve the situation, but in reality are incredibly expensive for the value they produce (or the risks they reduce). It's too easy to say, we'll totally redesign the system to avoid said problem. And what worries me, is often those very expansive actions, then cause you to overlook realistic but small investments that move the needle more than you would think.
And as a hot topic I also think the costs are an input into taking on tech debt. I know we all hate tech debt with a passion, but honestly, I think of it as a tool that can be wielded responsibly or irresponsibly. But if we don't know what our attention costs, we're going to have difficulty making the responsible choices about when and where to take on this debt. And then if we're not conscious about the debt, when it comes do it stings so much harder to pay down.
Meetings aren't even the worst resource wasters. Wrong initiatives, features, apps/platforms/services are. They capture future resources in form of maintenance and complexity with them.
Agreed, and this is where I think some more nuanced and conscious use of tech debt can be used when applicable.
It might be OK to place some bets on an initiative or feature, but if we all understand we're placing a bet, this is an area to load up on debt and really minimize the investment. This also requires an org that is mature about cutting the feature if the bet doesn't materialize, and if the market signal is generated will reinvest in paying down the debt. And also has the mega-danger territory of a weak market signal, where it's not clear if there is market signal or not, so the company doubles down into the weak signal.
Also these bets shouldn't be done in isolation in my view, well executed product and market discovery should also provide lots of relevant context on the ROI.
When I see someone just throwing a lot of numbers and graphs at me, I see that there are in to win an argument, and not propose an idea.
Of late, I've come across a lot of ideas from Rory Sutherland and my conclusion from listening to his ideas is that there are some people, who're obsessed with numbers, because to them it's a way to find certainty and win arguments. He calls them "Finance People" (him being a Marketing one). Here's an example
"Finance people don’t really want to make the company money over time. They just thrive on certainty and predictability. They try to make the world resemble their fantasy of perfect certainty, perfect quantification, perfect measurement.
Here’s the problem. A cost is really quantifiable and really visible. And if you cut a cost, it delivers predictable gains almost instantaneously."
> Choosing to spend three weeks on a feature that serves 2% of users is a €60,000 decision.
I'd really want to hire the Oracle of a PM/ Analyst that can give me that 2% accurately even 75% of the time, and promise nothing non-linear can come from an exercise.
As with any attempt to become more precise (see software estimation, eg. Mythical Man Month), we've long argued that we are doing it for the side effects (like breaking problems down into smaller, incremental steps).
So when you know that you are spending €60k to directly benefit small number of your users, and understand that this potentially increases your maintenance burden with up to 10 customer issues a quarter requiring 1 bug fix a month, you will want to make sure you are extracting at least equal value in specified gains, and a lot more in unspecified gains (eg. the fact that this serves your 2% of customers might mean that you'll open up to a market where this was a critical need and suddenly you grow by 25% with 22% [27/125] of your users making use of it).
You can plan for some of this, but ultimately when measuring, a lot of it will be throwing things at the wall to see what sticks according to some half-defined version of "success".
But really you conquer a market by having a deep understanding of a particular problem space, a grand vision of how to solve it, and then actually executing on both. Usually, it needs to be a problem you feel yourself to address it best!
None of his math really checks out. Building a piece of software is or at least was orders of magnitudes more expensive than maintaining it. But how much money it can make is potentially unbounded (until it gets replaced).
So investing e.g. 10 million this year to build a product that produces maybe 2 million ARR will have armortized after 5 years if you can reduce engineering spend to zero. You can also use the same crew to build another product instead and repeat that process over and over again. That's why an engineering team is an asset.
It's also a gamble, if you invest 10 million this year and the product doesn't produce any revenue you lost the bet. You can decide to either bet again or lay everyone off.
It is incredibly hard or maybe even impossible to predict if a product or feature will be successful in driving revenue. So all his math is kinda pointless.
> Building a piece of software is or at least was orders of magnitudes more expensive than maintaining it
This feels ludicrously backwards to me, and also contrary to what I've always seen as established wisdom - that most programming is maintenance. (Type `most programming is maintenance` into Google to find page after page of people advancing this thesis.) I suspect we have different ideas of what constitutes "maintenance".
A strict definition would be "the software is shipping but customers have encountered a bug bad enough that we will fix it". Most work is not of this type.
Most work is "the software is shipping but customers really want some new feature". Let us be clear though, even though it often is counted as maintenance, this is adding more features. If you had decided up front to not ship until all these features were in place it wouldn't change the work at all in most cases (once in a while it would because the new feature doesn't fit cleanly into the original architecture in a way that if you had known in advance you would have used a different architecture)
As with most things, isn't the truth somewhere in the middle? True cost/value is very hard to calculate, but we could all benefit by trying a bit harder to get closer to it.
It's all too common to frame the tension as binary: bean counters vs pampered artistes. I've seen it many times and it doesn't lead anywhere useful.
Here I think the truth is pretty far to one side. Most engineering teams work at a level of abstraction where revenue attribution is too vague and approximate to produce meaningful numbers. The company shipped 10 major features last quarter and ARR went up $1m across 4 new contracts using all of them; what is the dollar value of Feature #7? Well, each team is going to internally attribute the entire new revenue to themselves, and I don’t know what any other answer could possibly look like.
Even if you could do attribution correctly (I think you can do this partially if you are really diligent about A/B testing), that is still only one input to the equation. The other fact worth considering is the scale factor - if a team develops a widget which has some ARR value today, that same widget has a future ARR value that scales with more product adoption - no additional capital required to capture more marginal value. How do you quantify this? Because it is hard and recursive (knowing how valuable a feature will be in the future means knowing how many users you have in the future which depends on how valuable your features are as well as 100 other factors), we just factor this out and don't attempt to quantify things in dollars and euros.
You’re illustrating one of the points of TFA - a team that is equipped with the right tools to measure feature usage (or reliably correlate it to overall userbase growth, or retention) and hold that against sane guardrail metrics (product and technical) is going to outperform the team that relies on a wizardly individual PM or analyst over the long term making promises over the wall to engineering.
But surely you have to have at least an hypothesis of how software features you develop will increase revenue or decrease costs if you want to have a sustainable company?
I think the only thing that matters is whether the people on the team care deeply about the product; whether they care more about the product than their own careers (in the short term). Without that, any metric or way of thinking can and will be gamed.
Unfortunately, even with all the management techniques in the world, there are just some projects that are impossible to care about. There’s simply a significantly lower cap on productivity on these projects.
The over-simplification rubs me the wrong way, for example:
Consider a team of eight engineers whose mission is to build and maintain an internal developer platform serving one hundred other engineers. This is a common organizational structure, and it is one where the financial logic is rarely examined carefully.
The team costs €87,000 per month. To justify that cost, the platform they build needs to generate at least €87,000 per month in value for the engineers who use it. The most direct way to measure that value is through time saved, since the platform’s purpose is to make other engineers more productive.
At a cost of €130,000 per year, one engineer costs approximately €10,800 per month, or around €65 per working hour. For the platform team to break even, their platform needs to save the hundred engineers they serve a combined total of 1,340 hours per month. That is 13.4 hours per engineer per month, or roughly three hours per week per person.
There's a fungibility assumption which is pervasive here. In most cases, a platform team is there not "to save time".
It's there to deal with cross concerns that would be not only time consuming but could be business threatening, and in some cases, you keep there more expensive engineers that ensure that certain critical things are done right.
Making it solely about the extraction of dollars is a great recipe to make something mediocre. See Hollywood or Microslop.
Its like min-maxing a Diablo build where you want the quality of the product to be _just_ above the "acceptable" threshold but no higher because that's wasting money. Then, you're free to use all remaining points to spec into revenue.
Exactly. In addition, sometimes a good software "only" makes you save 1% of your time, but that 1% was a terrible burden that induced mental fatigue, made you take bad decisions, etc. It can even make a great Engineer stay when he would have left with the previous version.
While reading the article I was thinking the same thing. I can think of problems I've solved that directly affected 0% of our customers, but overloaded our customer support team.
This is some aggressive consultant fluff. Few companies have such distinctive "profit" measures. If "the financial logic is rarely examined carefully" than maybe there's a reason, since analysis like this is mostly fantastical and brittle. This is the sort of argument that is both rational and implausible. A manager might use this logic to rationalize firing an engineering team (which is mostly why guys like this get hired) but they won't use it to manage an engineering team.
I feel like there is a lot of nuance around this topic that is getting lost in the noise.
The direct and indirect financial impact of technical decisions are indeed hard to measure. But some technical decisions definitely have greater financial impact than others. Even if it's hard to precisely quantify the financial costs/benefits of every decision. It is possible to order them relatively. X is likely to make more money than Y. So we do X first and Y later.
There is a significant amount of chance involved in whether a product/feature will even make money at all. So even good plans with measurably positive expected value could end up losing money.
Just because it's impossible to be 100% certain of the outcome of any decision. Doesn't mean we should throw the baby out with the bathwater.
This article is not bad overall, but it does over-index on the cost of making software development costs and tradeoffs legible. Of course leadership does need to make decisions, and so the quest for better data and better cost modeling will continue, and rightly so, Goodhart's law notwithstanding.
I do like this bit though:
> A large codebase also carries maintenance costs that grow over time as the system becomes more complex, more interconnected, and more difficult to change safely. Every engineer added to maintain it increases coordination costs, introduces new dependencies, and adds to the organizational weight that slows decision-making. The asset and the liability exist simultaneously, and for most of the past twenty years, the financial environment masked the liability side of that equation.
And the insight that LLMs are exposing this reality is absolutely true. The funny thing is they are exposing it by accelerating both good and bad engineering practices. Teams with good engineering judgement will move faster than ever with fewer people, and teams with bad engineering judgment will bury themselves in technical debt so fast the wheels will come off.
For me, running an engineering org is primarily about talent acquisition and empowering those ICs with judgment to move quickly. How well systems and teams scale depends on the domain, product, and how it allows you to decouple things. With the right talent and empowerment there are often creative ways to make product and system tradeoffs and iterate quickly to change the shape of ROI. Any mapping to financial metrics is a hugely lossy operation that can't account for such changes. It might work in mature companies that are ossified and in the second half of their lifecycle, but in growing companies I think it's fundamentally misguided would amount to empowering the wrong people.
Still don’t understand what regular people (like the author) gain from selling how wonderful AI is. I get that the folks at Anthropic and openai shove AI through our throats every day, but nobodies?
> There is no cohort of senior product leaders who developed their judgment in conditions where their teams were expected to demonstrate financial return, because those conditions did not exist during the years when that cohort was learning the craft.
There totally is such a cohort. There are plenty of bootstrapped companies or startups that took only an angel round and did not benefit from the low rate environment, in fact they suffered because of the very high price of SWE labor. But those engineering managers exist and are out there right now still building efficiently, quietly growing, passionately serving customers, and keeping a close eye on the bottom line and risks because that’s their livelihood.
Look! A guy built 95% of slack in 2 weeks! Very skeptical of that btw, but also an organization that justifies every single team by exactly how much $ value they’re generating sounds like hell. How would you ever innovate or try out new ideas? It’s important to quantify what impact your team is generating but there are some cases (e.g. UX) which are really hard to quantify in $ but are still very important for the product
i think the thing that hits home for me here is that when you go back and do the after action report on where the time was spent last year and what it cost its terrifying. of course hindsight is 20/20 and predicting how difficult something is going to be is hard, but when you say we spent $x million on this version update that does y and $a hundred thousand to implement this feature - you think to yourself we would have never made that cost / benefit decision if we had known.
But in reality, the real cost of engineering teams grow as the sub organizations and teams continue to make short term decisions, optimizing for the next immediate win.
More common so in larger organizations than smaller ones
The points brought up are all great. I'm in a lower management position and I've wondered for a decade why the budget, cost, and return on work (i.e. revenue) were never divulged or connected to the work at hand. So kudos for facing that problem bluntly in language that's easy to follow. The place I'm at currently, its much more about automating away processes and making back office operations easier, so there's likely a lot of direct cost savings that we could measure, but don't.
Here's the problem I see with how this particular article is moving though: the context of these projects are often highly technical connecting back to the human problem space. Developers sit on the technical end but they also usually have a mental model for how it connects back to the non-technical. A product manager is another addition to compensate for the user connection. Between all of these folks they can only hold so much in their head about the problem space on a day-to-day basis. And that headspace for the problem is what is critical. Management wants to try a new idea for sales? They need to take it to the team with that problem space to translate it into working code. Even with the assistance of agents, one needs to hold the important patterns in their head. And my company certainly isn't going to vibe code its way through anything regulatory, mistakes there might cost us a ton in fees and bad PR. Hell I've seen product managers sweat over the possibility of getting a few 1 star reviews on the app store.
Anyway, you still need people with context to break things down and get them out the door, the agents can just assist with the speed of the In Progress stage. And clever teams can figure out how to automate their validation (but they could already do that).
Rockstar developers often seem to be the ones who can parachute in, gain context, make changes, and leave to find another problem space. They get bogged down when they've visited 10 or more problem spaces and then they start getting called back into service. Again the agents don't change any of that, the human involved has a finite capacity for context.
Teams who structure around maintaining context might be best suited for the new world of code.
I do tech dd, exit readiness and post merger integration in tech companies and this is my daily bread. The biggest lever I have: connecting initiatives to ROI/bottom line impact. It's incredible how blind product/software teams run. So much to do but most of it won't make any money and just feels productive. Connecting activities and work directly towards revenue is very important.
If your company runs well: won't hurt you much that you're not doing this. Otherwise this will be your end. And that really hurts because you lose the economical impact of the product and the jobs.
The argument against platform teams needs to be balanced with the compounding nature of technical debt.
The argument to always go for the biggest return works OK for the first few years of high growth (though the timeline is probably greatly compressed the more you use AI), but it turns into a kind of quicksand later.
This is a very reductionist way to calculate the value of a software team or any team within an organization. That’s because many times the value delivered by a team is not necessarily monetary but strategic.
How could they not? When I penciled this out ~18 years ago, I included the amortized cost of all the interviews it took to hire a given engineer as well. It's not rocket surgery, as they say.
One interesting factor that I rarely see discussed is this: Let's say a DevOps person does some improvement to internal tooling and a task that devs had to oversee manually now is automated. Every dev spent about 2 hours per week doing this task and now they don't have to anymore. Now, have we saved 2 hours of salary per dev per week?
Not sure. Because it totally depends on what they do instead. Are they utilizing two hours more every week now doing meaningful work? Or are they just taking things a bit more easy? Very hard to determine and it just makes it harder to reason about the costs and wins in these cases.
They have saved _more_ than two hours per dev and week. There's a compound factor and now code can be more reliable (less outages or emergencies fixing bugs) etc. Also having a sane working environment helps engineers not quitting, which is very expensive if they are replaced.
The freed-up time question is answerable when the work has clear metrics. A model test suite dropping from 6 minutes to 66 seconds saves developer time on every single run. Ten developers running tests five times a day, the math is straightforward.
The problem is that most engineering work lacks that kind of before/after measurement. Not because it is unmeasurable, but because nobody set up the baseline. Profile before you optimize and the return on investment calculates itself.
If a test suite runs for either 6 minutes or 66 seconds I am not staring at it while it runs. I am doing something else. So that is not holding up my time
Yes. You work 2 hours less, but what do you produce in those two extra hours? Can you say that your company now spends X dollars less or earns X dollars more? I don't think it can be that clear.
And what is your theory? That it’s better to not save those 2 hours since they will just go to waste anyway? Or that there is diminishing returns to saving work as people will tend to just spend longer on other things they were already doing? How can you be sure those 2 hours will not actually be used by most to do very productive things that in the end look like +4 hours in return??
I don't understand the urgency around quantifying every aspect of the software process. Surely, we are in agreement that money in must at least equal money out if the company is to be viable? This is a simple quickbooks report, is it not?
Why don't we instead focus our energies on the customer and then work our way backward into the technology. There are a lot of ways to solve problems these days. But first you want to make sure you are solving the right problem. Whether or not your solution represents a "liability" or an "asset" is irrelevant if the customer doesn't even care about it.
Why don't we instead focus our energies on the user. For some very important software applications the customer is not the user. Let the sales department focus on the customer.
The estimate cost number is for very large companies with massive overhead bulk. Dump the management overhead, the HR machine and other things smaller companies do not have and this number comes down massively.
I’m a little surprised that fundamental concepts like burn rates are not expected to be understood in software. In other professional services, this is often top of mind, at least for managers.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. […] The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Maybe there’s some new paradigm that makes this true. But it doesn’t seem obviously true to me.
Humans make the best code long term when everything orbits a vision of the underlying problem space.
LLMs seem to only consider the deeper problem space when I explicitly flag it for them, otherwise they write “good enough for this situation” type code. And that stack of patches type code is exactly how the code becomes messy and complicated in the first place.
There's a lot here, mixed some marketing and some dubious LLM claims. That being said, I think there could be real benefit in pushing detail on how features effect finances down to individual teams. Right now I have two features on my desk that both seem reasonable; if I knew which one would generate more income (i.e. increase customer retention, lead to more sales, etc.) that would make this choice a lot easier.
The core reason most orgs are "flying blind" is that we still don't have a reliable metric for technical debt. Management only tracks shipped features and velocity because they are easy to measure. They completely ignore the hidden liability of a rushed, messy codebase until productivity eventually grinds to a halt. You can't measure the economics of a team if you ignore the balance sheet.
Time to ship, change failure, rework rates, mean time to resolve, code complexity, code churn, average age of dependencies - there's a ton of reliable metrics for technical debt, but they have to actually be looked at to do any good.
The problem is that technical debt is a more complex concept & thus requires more metrics to properly measure than a simple concept like velocity.
If you've ever been in a meeting with multiple L8's arguing over features, you should be able to estimate how much each hour of that meeting is costing the org.
The 3-5x return threshold is the part most eng leaders never internalize. I've seen teams spend entire quarters on internal tooling that saves maybe 20 minutes per developer per week — nowhere near break-even, let alone a healthy return. The uncomfortable truth is that most prioritization frameworks (RICE, WSJF, etc.) deliberately avoid dollar amounts because nobody wants to see the math on their pet project. Once you attach real costs to sprint decisions, half the roadmap becomes indefensible.
On the other hand, I’ve also seen single developers create a tool or dashboard off-the-books that had widespread adoption. Things that would never have breached the top 100 features list since they are entirely internal. The irony is then they are expected to maintain it indefinitely without official effort allocation.
You’re absolutely right, but just to a point. It should be easy to clearly quantify the desired financial outcome of a sprint, but not of its components. I don’t want to spend a single minute figuring out the financial outcome of a single ticket.
> This does not mean that Slack’s engineering investment was wasted, because Slack also built enterprise sales infrastructure, compliance capabilities, data security practices, and organizational resilience that a fourteen-day prototype does not include.
The LLM-agent team argument also misses the core point that the engineering investment (which actually encompasses business decisions, design and much more than just programming) is what actually got Slack (or any other software product) to the point where is it is now and where it's going in the future and creating a snapshot of the current status is, while maybe not absolutely trivial, still just a tiny fraction of the progress made over the years.
With a long time in the industry and seeing how so many big software companies work, this really really chimed with me. Many/most teams and projects and busy work are not actually moving the bottom line, at massive opportunity cost! And there's so little awareness that most people in squads and their managers will think they are the exception.
Whereas Whatsapp with its 30 software engineers was the exception etc.
A chat with friends showed how there are parallels with how LLMs will happen in the short-term future - say the next 5 years - and the whole MapReduce mess. Back when Hadoop came along you built operators and these operators communicated through disk. It took years even after Spark was about for the hadoop userbase as a whole to realise that it is orders of magnitude more efficient to only communicate through disk when two operators are not colocatable on the same machine and that most operators in most pipelines can be fused together.
So for a while LLMs will be in the Hadoop phase where they are acting like junior devs and making more islands that communicate in bigger bloated codebases and then there might be a realisation in about 2030 that actually the LLMs could have been used to clean up and streamline and fuse software and approach the Whatsapp style of business impact.
I've been a software engineer for more than ten years and never cared about these kinds of topics. But lately, I've found them genuinely interesting. Could someone recommend books on the economics of software businesses? I can't take this author's content seriously.
I have been interested in this topic for a long time and to be honest, there is no better book on the topic than The Mythical Man Month. Yes from the 70s I think, but still the best I have read.
Wow that article made a hard right turn about halfway through.
"Most organizations improperly account for engineering teams and incorrectly consider both code and team growth to be assets when in fact they increase complexity..... but LLMs can fix all of this"
Wtf?
Measuring things that actually matter is a great way to improve clarity on a team, you can probably just stop reading this article at the halfway point.
EDIT:
Specifically this paragraph is insane
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
Then let's disregard cost of running and maintaining a system for having exact financial feedback.
We do proxy measurements because having exact data is hard because there is more to any feature than just code.
Feature is not only code, it is also customer training, marketing - feature might be perfectly viable from code perspective but then utterly fail in adoption for reasons beyond of Product Owner control.
What I saw in comments — author is selling his consultancy/coaching and I see in comments that people who have any real world experience are also not buying it.
The "author" used someone's vibecoded Slack clone to justify his conclusions. I think he believes that the majority of Slack's value lies in the slick CSS animations.
I do agree with his thesis in the middle, about how the ZIRP decade and the cultures that were born from that period were outrageous and cannot survive the current era. It's a brave new world, and it's not because of AI. It's because there's just not enough money flowing anymore, and what little is left is sucked up by the big boys (AI).
> A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
What experience is this guy basing this on? My guess is absolutely none at all.
Maybe this will be the case in the future, but as of right now if I cut 10 agents loose for 10 days one of our repos at work and tell them to clean it up but but keep the tests passing, we’d be drowning in support tickets.
Tests don’t cover all observable behavior. Every single production bug we’ve had made it through the test suite.
Also this guy only had a vague idea of how platform engineering teams work in large organizations.
Platform teams are the engineering org’s immune system. They’re how we fight back against the tech debt accumulated by the relentless march of features of the week.
If anything the extra code people are cranking out with AI make them more necessary.
If you want to understand economics, I recommend watching some of Don Reinertsen's videos on Lean 2.0. He goes into a few concepts quite deeply that are quite intuitive.
Cost of delay: calculating the cost of delaying by a few weeks in terms of lost revenue (you aren't shipping whatever it is you are building), total life value of the product (your feature won't be delivering value forever), extra cost in staffing. You can slap a number on it. It doesn't have to be a very accurate number. But it will give you a handle on being mindful that you are delaying the moment where revenue is made and taking on team cost at the cost of other stuff on your backlog.
Option value: calculating the payoff for some feature you add to your software as having a non linear payoff. It costs you n when it doesn't work out and might deliver 10*n in value if it does. Lean 1.0 would have you stay focused and toss out the option for that potential 10x payoff. But if you do a bit of math, there probably is a lot of low hanging fruit that you might want to think about picking because it has a low cost and a potential high payoff. In the same way variability is a good thing because it gives you the option to do something with it later. A little bit of overengineering can buy you a lot of option value. Whereas having tunnel vision and only doing what was asked might opt you out of all that extra value.
A bad estimation is better than no estimation: even if you are off by 3x, at least you'll have a number and you can learn and adapt over time. Getting wildly varying estimates from different people means you have very different ideas about what is being estimated. Do your estimates in time. Because that allows you to slap a dollar value on that time and do some cost calculations. How many product owners do you know that actually do that or even know how to do that?
Don't run teams at 100% capacity. Work piles up in queues and causes delays when teams are pushed hard. The more work you pile on the worse it gets. Worse, teams start cutting corners and take on technical debt in order to clear the queue faster. Any manufacturing plant manager knows not to plan for more than 90% capacity. It doesn't work. You just end up with a lot of unfinished work blocking other work. Most software managers will happily go to 110%. This causes more issues than it solves. Whenever you hear some manager talking about crunch time, they've messed up their planning.
Stretching a team like that will just cause cycle times to increase when you do that. Also, see cost of delay. Queues aren't actually free. If you have a lot of work in progress with inter dependencies, any issues will cause your plans to derail and cause costly delays. It's actually very risky to do that if you think about it like that. If you've ever been on a team that seemingly doesn't get anything done anymore, this might be what is going on.
I like this back of the envelope math; it's hard to argue with.
I used to be a salaried software engineer in a big multinational. None of us had any notion of cost. We were doing stuff that we were paid to do. It probably cost millions. Most decision making did not have $ values on them. I've since been in a few startups. One where we got funded and subsequently ran out of money without ever bringing in meaningful revenue. And another one that I helped bootstrap where I'm getting paid (a little) out of revenue we make. There's a very direct connection between stuff I do and money coming in.
Measuring a platform team's productivity in pure "hours saved" is missing a huge point: reliability. If your platform prevents even one outage every month, how much business value and capital are saved? That analysis is utterly absent from this article.
It also seems to focus on "LLMs make code cheap" which is a half truth: LLMs make (so far) easy or messy code cheap. I'd bet that there too the analysis on reliability/stability is missing from the author's perspective.
I think the article make some great points, however this part is not even wrong:
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
LLMs are not conscious, that means left on their own devices they will drift. I think the single most important issue when working with LLMs is that they write text without a layer that are aware what's actually being written. That state can be present in humans as well, like for example in sleepwalking.
Everyone who's tried to to complete vibe coding a somewhat larger project knows that you only get to a certain level of complexity until the model stops being able to reason about the code effectively. It starts to guess why something is not working and cannot get out of that state until guided by a human.
That is not new state in the field, I believe all programmers has at points in their career come across code that's been written with developers needing to get over a hard deadline with the result of a codebase that cannot effectively be modified.
I think for a certain subsets of programming projects some projects could possibly be vibe coded as in that code can be merged without human understanding. But it has to be very straightforward crud apps. In almost everything else you will get stopped by slop.
I suspect that the future of our profession will shift from writing code to reading code and to apply continuous judgement on architecture working together with LLMs. Its also worth keeping in mind that you cannot assign responsibility to an LLM and most human organization requires that to work.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
I keep seeing this assumption that "unmanageable" caps out at "kinda hard to reason about", and anyone with experience in large codebases can tell you that's not so. There are software components I own today which require me to routinely explain to junior engineers (and indeed to my own instances of Claude) why their PR is unsound and I won't let them merge it no matter how many tests they add.
Yeah this really breaks down when you put the logic up against ANY sort of compliance testing. Ok you don’t meet compliance, your agents have spent weeks on it and they’re just adding more bugs. Now what are you going to do? You have to go into the code yourself. Uh oh.
> even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today
Citation needed. A human engineer can grok a lot in 10 days, and an agent can spend a lot of tokens in 10 days.
I expect engineering departments to be flattened and reduced in people. Corporate silos of responsibility around apps will probably disappear as a senior developer with tools can be pretty effective across platforms and technologies because the value of architectural and design thinking becomes more valuable
I see we're once again missing the existence of indirect impact. There's a reason organizations look at revenue/engineer overall instead of trying to attribute it directly to specific teams.
I guess his students get to relearn that on their own.
Also, any post talking about building software and then contains the suggestion that "cost per unit" is an efficiency metric needs to come to the red courtesy phone, Taylorism would like to have a chat about times gone by.
In many companies there are 3 to 5 other people per developer (QA, agile masters, PO, PM, BA, marketing, sales, customer support etc.). The costs aren't driven just by the developer salaries.
A CEO can cost as much as 10 developers, sometimes more.
Yet another essay completely missing the point, and an audience that misses it as well. All these organizations fly blind because nowhere in any technology or science education is there any emphasis on effective communications, conveying understanding, solving disagreements with analysis and the best of both perspectives... none of these critical communication skills are taught to the very people that most need them. It's a wonder our civilization functions at all.
"Flying blind" is a completely standard idiom originating from flying while blinded by e.g. cloud or darkness. Its meaning is a figurative transplant of a literal description.
I know it’s an idiom. The point is that it still uses blindness as a stand-in for incompetence/unsafe guessing. Being common doesn’t make it harmless. Common just means we’ve normalized it. And you defending it shows that weve normalized it to a point where the double-meaning is seemingly only apparent to blind people.
It absolutely does not use blindness as a stand-in for incompetence, that is your own outrage-seeking interpretation of it. A neutral interpretation would be that "flying blind" is to "operate without perfect information". It is a simple description of operating conditions, not a derogatory term in any way. Your reply is worded in such a way as to indicate that you think the person you're replying to deserves to be shamed for 'defending' it, but having a disability does not entitle you to browbeat the world into submission and regulate all usage of any words associated with your disability as you see fit. This is quite benign and people are perfectly well within their right to object to somebody trying to police plainly descriptive language.
Your reply would be much improved if it were just this part.
> A neutral interpretation would be that "flying blind" is to "operate without perfect information". It is a simple description of operating conditions, not a derogatory term in any way.
Entering it would also have put less wear and tear on the input device.
You are equivocating. Blindness as a personal chronic medical condition is not the same as a situational difficulty.
The pilot who is "flying blind" has perfectly normal eyeballs. They are not necessarily a member of any minority group, except for their chosen profession.
_____
As for "blind" being a word that appears more frequently in a negative rather than positive way... Well, I'm not sure what to tell you, that's just 10,000+ years of language from a species that evolved to prefer seeing.
To offer an example of the positive case, the idiom "justice is blind". Yes, there is a popular cultural mascot wearing a strip a fabric over her eyes, but again: The justice doesn't actually involve any (real) personal medical condition, and it's considered a positive feature for the job.
Well flying blind is unsafe guessing (ignoring modern instruments), that's a fact. But only "flying" and "blind" together. No one thinks this makes the word "flying" has a negative connotation here, and same with "blind".
Like "drinking" and "driving". On their own, they're both neutral, but "drinking and driving" is really bad.
No, it means not being able to see what is going on. Which is literally what the word blind means. You can be blinded by many things (blindfold, clouds/fog, bright lights, darkness, accidents, genetics, etc), permanently and temporarily. Non-humans can be blind and blinded. YOU are making it about a specific situation and projecting value judgements on it.
The author specifically says FLYING blind. Not "stumbling around like a blind person" or some such. If you are offended, that is on you. It's your right to be offended of course, but don't expect people to join in your delusion.
All of this article, both the good (critique of the status quo ante) and the bad (entirely too believing of LLM boosterism) are missing (or not stressing enough) the most important point, which is that the actual programming is not the hard part. Figuring out what exactly needs programmed is the hard part.
For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed. This is very difficult for the people who ask for the software, to understand, and it is quite often very difficult for the people doing the programming to understand.
Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like. Once you have arrived at that understanding, then there are a variety of ways to make what you need, but that is not the rate-limiting step.
> which is that the actual programming is not the hard part. Figuring out what exactly needs programmed is the hard part.
I’m growing tired of this aphorism because I’ve been in enough situations where it was not true.
Some times the programming part really is very hard even when it’s easy to know what needs to be built. I’ve worked on some projects where the business proposition was conceptually simple but the whole reason the business opportunity existed was that it was an extremely hard engineering problem.
I can see how one could go through a career where the programming itself is not that hard if you’re mostly connecting existing frameworks together and setting up all of the tests and CI infrastructure around it. I have also had jobs where none of the programming problems were all that complicated but we spent hundreds of hours dealing with all of the meetings, documents and debates surrounding every change. Those were not my favorite companies
> it was an extremely hard engineering problem
But that is not programming then? Doing voice recognition in the 90s, missile guidance systems, you name it, those are hard things, but it's not the "programming" that's hard. It's the figuring out how to do it. The algorithms, the strategy, etc.
I might be misunderstanding, but I cannot see how programming itself can be challenging in any way. It's not trivial per se or quickly over, but I fail to see how it can be anything but mechanical in and of itself. This feels like "writing" as in grammar and typing is the hard part of writing a book.
I count "figuring out how to do it" as part of the work of programming, personally.
Yep, I think people who repeat this aphorism essentially equate programming with typing, or as you say just connecting existing bits together. Programming is the working out how to get a computer to perform some task, not just the typing, it's the algorithms, the performance balancing, the structuring, the integration etc.
Imagine telling workers at a construction company that the hard problem was never building stuff but figuring out what needs to be built.
The saying also ignores the fact that humans are not perfect programmers, and they all vary in skills and motives. Being a programmer often not about simply writing new code but modifying existing code, and that can be incredibly challenging when that code is hairbrained or overly clever and the people who wrote it are long gone. That involves programming and it's really hard.
Isn't overly clever code a result of programmers doing simple things the hard mode?
Okay it's a spicy take, because juniors also tend to write too smart code.
Figuring out what to do and how to do it, is maybe not hard but it's effort. It's a hidden thing because it's not flat coding time, it requires planning, research, exploration and cooperation.
It's also true that some seemingly simple things are very hard. There are probably countless workarounds out there and the programmer wasn't even aware he is dodging an NP hard bullet.
Both arguments are valid.
I think the weight leans on effort, because effort is harder to avoid. Work, complexity, cruft piles up, no matter what you do. But you can work around hard problems. Not always but often enough. Not every business is NASA and has to do everything right, a 90% solution still generates 90% returns, and no one dies.
> Imagine telling workers at a construction company that the hard problem was never building stuff but figuring out what needs to be built.
Isn't this kind of true, though? Housing construction, for instance, isn't bottlenecked by the technical difficulties of building, but by political and regulatory hurdles. Or look at large, capital-intensive projects such as the always-proposed, never built new Hudson river train tubes. Actually building these will take billions of dollars and many years, but even they would be long built by now were it not for their constantly being blocked by political jockeying.
Building stuff _does_ often involve difficult technical challenges, but I still think that as a general aphorism the observation that this isn't the _hardest_ part holds true.
We might have different concepts of "hard", but if I were a construction worker I think I would agree. Hell, I'm a developer and I agree. Figuring out what to do definitely is the hard part. The rest is work and can be sweaty, but it's not hard as if it's full of impenetrable math or requiring undiscovered physics. It's just time-consuming and in the case of construction work physically tiring.
It might be that I have been doing this for too long and no longer see it.
This concept exists outside of engineering too. It's captured in the more negatively intentioned: ““The best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer". In user research, it's a much better signal when people correct you than when they agree. Politeness is easy—especially under the circumstances (power dynamic of you paying them, they only half care about your work, people generally want to be nice/agreeable, etc.)—such that you should be weary of it. Similarly trying to get real project goals or real requirements or real intentions from a PM or a boss, who may well be hiding that they there isn't much vision underneath things, is the same. The problem is that as productive as it is for developing the team's thinking, it will (1) probably come off as unproductive and challenging because you're slowing "progress" and (2) saying dumb wrong things makes you seem dumb and wrong. But per the concept, even when you do have the foresight to question, you're not allowed to just ask.
+1 A huge amount of software - probably most - is not actually generating value and in many cases is actually reducing value.
I've seen teams build and re-build the same infrastructure over and over.
I saw a request that could have been met with a few SQL queries and a dashboard got turned into a huge endeavor that implements parts of an ETL, Configuration Management, CI/CD, and Ticketing system and is now in the critical path of all requests all because people didn't ask the right questions and the incentive in a large organization is to build a mini-empire to reign over.
That said, smart infrastructure investment absolutely can be a competitive advantage. Google's infrastructure is, IMO, a competitive advantage. The amount of vertical integration and scale is unparalleled.
One of the most confusing moments in my early career was when someone spent two whole quarters building a custom tool that did something a mature and well respected open source project did for us. There was no advantage to his tool and he would admit it when cornered by the question.
We all thought he would get reprimanded for wasting so much time, but by the time management figured out was happening they decided they needed to sell it as a very important idea rather than admit they just spent $100,000 of engineering time on something nobody needed. So it turned into something to celebrate and we were supposed to find ways to use it.
That company went down in flames about a year later. That’s how I learned one way to spot broken organizations and get out early rather than going down with the ship.
Incentive of undemocratic groups is to build mini-empires yes, but if the business decisions were led by workers instead of a group of tyrants it'll most likely be a better decision. If we want lived examples of this, look at recorded history.
> the actual programming is not the hard part
We've all been hearing that a lot and it's made a lot of people forget that, although programming might not be the hardest part, it's still hard.
Hard perhaps but it feels a lot easier now than three years ago. Or so my backlog of personal projects outside of my most familiar stack would suggest.
What is hard about it? Young children seem to pick it up with ease. It cannot be that hard?
Determining what to program can be hard, but that was already considered earlier.
The only other place where I sometimes see it become hard for some people is where they treat programming as an art and are always going down crazy rabbit holes to chase their artistic vision. Although I would say that isn't so much that programming is hard, but rather art that is trying to push boundaries is hard. That is something that holds regardless of the artistic medium.
> What is hard about it? Young children seem to pick it up with ease. It cannot be that hard?
That's like saying "becoming a writer can't be that hard, since kids learn how to write in the elementary school".
Given a set of requirements, there are many different ways to write a program to satisfy them. Some of those programs will be more efficient than others. Some will scale better. Some will end up having subtle bugs that are hard to reproduce.
> That's like saying "becoming a writer can't be that hard, since kids learn how to write in the elementary school".
Is writing hard? I expect most can agree that determining what to write, especially if you have an objective (e.g. becoming a best-selling novelist), can be extremely hard — but writing itself?
> there are many different ways to write a program to satisfy them.
"What to program" being hard was accepted from the onset and so far we see no disagreement with that.
> Is writing hard? I expect most can agree that determining what to write, especially if you have an objective (e.g. becoming a best-selling novelist), can be extremely hard — but writing itself?
Being able to transcribe sentences in a certain language is the skill kids pick up in elementary schools. Being a writer requires a whole set of skills built on top of that.
The reason why I brought up that difference in the first place is because both of these are called "writing". When a fan says "I heard the author is writing the next book in the series" or when an author says "I haven't been able to focus on writing due to my health issues", they're not talking about the low-level transcription skill.
> "What to program" being hard was accepted from the onset and so far we see no disagreement with that.
Similar to your interpretation of "writing", you're choosing to interpret "programming" as a process of transcribing an algorithm into a certain programming language, and everything else ends up being defined as "what to program".
That's an overly reductive interpretation, given the original context:
> For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed.
> [...]
> Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like.
Notice that the original comment talks defines "determining what to program" as a process of refining your understanding of the problem itself.
In my reading of the original comment, understanding what your users need is "what to program". Writing code that solves your users' requirements is "programming".
> What is hard about it? Young children seem to pick it up with ease. It cannot be that hard?
They do? I've known plenty of kids and young adults who utterly failed to become even borderline competent at programming.
They don't? It is taught in schools in the early elementary level. I see no indication that most are failing.
I think we can agree that few of them would be economically useful due to not knowing what to program. There is no sign of competency on that front. Certainly, even the best programmer in the world could theoretically be economically useless. Programmers only become economically useful when they can bridge "what to program".
> They don't? It is taught in schools in the early elementary level. I see no indication that most are failing.
Programming in elementary schools typically involves moving a turtle around on the screen. (My mother taught 4th grade in New York for many years, and I believe her when she explained the computer instruction.)
Economically valueable programming is much more complex than is taught in many schools through freshman college. (I taught programming at the college level from 1980 till I retired in 2020.)
Because economically valuable programming has to consider what to program, not simply follow the instructions handed down by a teacher of exactly where and how to move a turtle on the screen. But nobody questions "what to program" not being hard. It was explicitly asserted in the very first comment on this topic as being hard and that has also carried in the comments that have followed.
For whatever reason 1/10 engineers seem to be able to bring an idea from start to finish by themselves. I don’t know that it’s technical skill, but something difficult is going on there.
This is true when fresh college grads are building stuff. Experienced engineers know how to build things much more efficiently.
Also people like to fantasize that their project, their API, their little corner of the codebase is special and requires special treatment. And that you simply cant copy the design of someone much more experienced who has already solved the problem 10 years ago. In fact many devs boast about how they solved (resolved) that complex problem.
In other domains - Professional engineers (non-swe) know that there is no shame in simply copying the design for a bridge that is still standing after all those years.
> All of this article, both the good (critique of the status quo ante) and the bad (entirely too believing of LLM boosterism) are missing (or not stressing enough) the most important point, which is that the actual programming is not the hard part. Figuring out what exactly needs programmed is the hard part.
HARD AGREE. But…
Taken as just such, one might conclude that we should spend less time writing software and more time in design or planning or requirement gathering or spec generating.
What I’ve learned is that the painful process of discovery usually requires a large contribution of doing.
A wise early mentor in my career told me “it usually takes around three times to get it right”. I’ve always taken that as “get failing” and “be willing to burn the disk packs” [https://wiki.c2.com/?BurnTheDiskpacks]
While it's true that 'figuring out what exactly needs to be programmed' was always the hard part. It's not the part that the most money was spent on. Actually programming the thing always took up the most time and money.
True enough, but I think that a lot of "actually programming the thing" turned out to be "figuring out what exactly needs to be programmed". Afterwards, people did not want to admit that this was the case, perhaps even to themselves, because it seemed like a failure to plan. However, in most (nearly all?) cases, spending more time prior to programming would not have resulted in a better result. Usually, the best way to figure out what needs to be programmed, is to start doing it, and occasionally take a step back to evaluate what you've learned about the problem space and how that changes what you want to actually program.
In other words "figuring out what needs to be programmed" and "actually programming the thing" look the same while they're happening. Afterwards, one could say that the first 90% was figuring out, and only the last 10% was actually doing it. The reason the distinction matters, is that if you do something that makes programming happen faster, but figuring out happen slower, then it can have the surprising affect of making it take longer to get the whole thing done.
> Usually, the best way to figure out what needs to be programmed, is to start doing it, and occasionally take a step back to evaluate what you've learned about the problem space and how that changes what you want to actually program.
Replace the verb "program" with "do" or anything else, and you've got a profound universal philosophical insight right there
I'm curious how this would work with LLMs increasing the speed to prototype. Low stakes changes to try something out, learn from it, and pivot.
My company is fully remote so all meetings are virtual and can be set to have transcripts, parsing through that for the changes needed and trying it out can be a simple as copy-paste, plan, verify, execute, and distribute.
> Actually programming the thing always took up the most time and money.
I'm curious is any quantitative research has been done comparing time writing code vs time gathering and understanding requirements, documenting, coordinating efforts across developers, design and architecture, etc.
No, that's exactly the topic of the article.
The claim is that most software teams do not consider the financial impact of their work. Is what they are doing producing value that can be measured in dollars and cents and is greater than the cost of their combined cost of employment?
The article suggests that there is a lot of programming being done without considering what exactly needs to be programmed.
> The article suggests that there is a lot of programming being done without considering what exactly needs to be programmed.
And the parent rightfully points out that you cannot know exactly what needs to be programmed until after you've done it and have measured the outcome. We literally call the process development; for good reason. Software is built on hunches and necessarily so. There is an assumption that in the future the cost of the work will pay back in spades, but until you arrive in that future, who knows? Hence why businesses focus on metrics that try to observe progress towards finding out rather than tracking immediate economic payoff.
The interesting takeaway from the article, if you haven't give this topic much thought already, is that the changing financial landscape means that businesses are going to be more hesitant to take those risks. Right now there still seems to be enough optimism in AI payoffs to keep things relatively alive, but if that runs out of steam...
Agreed, but are you also implying that the process of iteratively "programming something that's not it, and then replacing it" multiple times is not in the scope of what LLMs can/will do?
Most of the time taken during this process is spent getting feedback, processing it, and learning that it's not it. So even if LLMs drive the build time to zero, they won't speed up the process very much at all. Think 10% improvement not 10x improvement.
I'd even argue LLMs can speed up this iterative process.
> Figuring out what exactly needs programmed is the hard part.
Making good decisions is the hard part, whether it's about programming or about what needs to be programmed.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Then I'd wager it's the same for the courses and workshop this guy is selling...an LLM can probably give me at least 75% of the financial insights for not even .1% of what this "agile coach" is asking for his workshops and courses.
Maybe the "agile coach LLM" can explain to the "coding LLM's" why they're too expensive, and then the "coding LLM's" can tell the "agile coach LLM" to take the next standby shift then, if he knows so much about code?
And then we actual humans can have a day off and relax at the pool.
Ceding the premise that the AGI is gonna eat my job, my job involves reading the spec to be able verify the code and output so the there’s a human to fire and sue. There are five layers of fluffy management and corporate BS before we get to that part, and the AGI is more competent at those fungible skills.
With the annoying process people out of the picture, even reviewing vibeslop full time sounds kinda nice… Feet up, warm coffee, just me and my agents so I can swear whenever I need to. No meetings, no problems.
https://de.wikipedia.org/wiki/Sitzredakteur
Amazing bit of history, thank you!
There’s gonna be one guy in charge of you, and he’s going to expect you to be putting out 20x output while thanking him for the privilege of being employed, assuming all goes the way every management team seems to want
I dont think this will happen because AI has become a straight up cult and things that are going well don’t need so many people performatively telling each other how well things are going.
If a SWE could truly output 20x their effort, that person would probably be better at freelancing or teaming up with another SWE. If something can be automated away to AI is Project Management. Also, there has to be a point where delivering more and faster code doesn’t matter, because the choke points are somewhere else in the Project Life Cycle, say waiting for legal, other vendors, budgets, suppliers, etc, so productivity could max out at say 3X, after which, unless you have a strong pipeline of work, your engineers will be sitting around waiting for the next phase of the project to start.
> If a SWE could truly output 20x their effort, that person would probably be better at freelancing or teaming up with another SWE.
Yes but this requires the willingness to take on the additional stress and risk of managing your own sales, marketing, accounting, etc.
> There’s gonna be one guy in charge of you, and he’s going to expect you to be putting out 20x output while thanking him for the privilege of being employed, assuming all goes the way every management team seems to want
A perfect summation.
But that's really not the point of this particular article.
The point being made is, do you know what financial impact your work is having in terms of increasing revenues or decreasing costs?
If the company revenue is going down and costs increasing, developers will be laid off regardless of how many tickets they close.
To add to this, I remember somebody here on HN pointing out a few months ago that they’ve never seen so much investment in businesses that are going “we don’t actually know what the billion dollar application is so we’re going to sell y’all some rough tools and bank on the rest of you figuring it out for us.”
I got 99 problems but an agent ain’t one.
I think you missed the key capitalist part:
There needs to be someone to benefit from all your labor. No, no, it can't be you. You have conflicts of interest!
What if your work isn't benefitting anyone?
If it sells, you don't own it! No one said a product needs to benefit people.
> my job involves reading the spec to be able verify the code and output so the there’s a human to fire and sue.
So, you're the programmer (verify code) and the QA (verify output) and the project manager (read the spec)?
That's the difference between programming and software engineering.
A software engineer should be able to talk directly to customers to capture requirements, turn that into spec sheet, create an estimate and a bunch of work items, write the whole system (or involve other developers/engineers/programmers to woek on their work items), and finally be able to verify and test the whole system.
That entire role is software engineering. Many in the industry suck at most of the parts and only like the programming part.
I think the hardest part is requirements gathering (e.g. creating organized and detailed notes) and offloading work planned work to other developers in a neat way, generally speaking, based on what I see. In other words, human friction areas.
> That entire role is software engineering. Many in the industry suck at most of the parts and only like the programming part.
I'm always amused when I read anecdotes from a role siloed / heavily staffed tech orgs with all these various roles.
I've never had a spec handed to me in my career. My job has always been been end to end. Talk to users -> write spec into a ticket -> do the ticket -> test the feature -> document the feature -> deploy the feature -> support the feature in production from on-call rotation.
Often I have a few juniors or consultants working for me that I oversee doing parts of the implementation, but thats about it.
The talking to users part is where a lot of people fall down. It is not simply stenography. Remember most users are not domain/technical experts in the same things as you, and it's all just a negotiation.
It's teasing out what people actually want (cars vs faster horses), thinking on your feet fast enough to express tradeoffs (lots of cargo space vs fuel efficiency vs seating capacity vs acceleration) and finding the right cost/benefit balance on requirements (you said the car needs to go 1000 miles per tank but your commute is 30 miles.. what if..).
> I've never had a spec handed to me in my career.
We call those places "feature factories".
I have been required to talk with many in my life, I have never seen one add value to anything. (There are obvious reasons for that.) But yet, the dominant schools in management and law insist they are the correct way to create software, so they are the most common kind of employment position worldwide.
Careful with that though. The guy whose entire job is to "take requirements from the customers and bring them to the engineers" really does get awful tetchy if the engineers start presuming to fill his role. Ask me how I know.
Please tell more.
I have the same impression. But that is where it is going - roles merging and being able to do the full spectrum will be valuable.
How do you know?
qa has long ago merged with programming in "unified engineering". Also with SRE ("devops") and now the trend is to merge with CSE and product management too ("product mindset", forward-deployed engineers). So yeah, pretty much, that's the trend. What would you trust more - an engineer doing project management too - or a project manager doing the engineering job?
The PMs and QAs I know would disagree with that assessment.
> What would you trust more - an engineer doing project management too - or a project manager doing the engineering job?
If one of the three, {PM, QA, coder}, was replaced by AI, as a customer I'd prefer to pick the team missing the coder. But for teams replacing two roles with AI, I'd rather keep the coder.
But a deeper problem now is, as a customer, perhaps I can skip the team entirely and do it all myself? That way, no game of telephone from me to the PM to the coder and QA and back to me saying "no" and having another expensive sprint.
If I'm managing a company of about 10 people to do something in the physical world, I'd probably skip the PM & QA and hire the engineer and have the engineer task the LLM with QA given a clear set of requirements and then manage the projects given a clear set of deadlines.A good SE can do a "good enough" job at QA and PM in a small company that you won't notice the PM & QA is missing. But the PM & QA can always be added or QA can be augmented with a specialist assuming you're LLM-driven.
Of course if none of your software projects are business-critical to the degree that downtime costs money pretty directly then you can skip it all and just manage it yourself.
The other thing you should probably understand is that the feedback cycle for an LLM is so fast that you don't need to think of it in terms of sprints or "development cycles" since in many cases if you're iterating on something your work to acceptance test what you're getting is actually the long pole, especially if you're multitasking.
> If one of the three, {PM, QA, coder}, was replaced by AI, as a customer I'd prefer to pick the team missing the coder.
I am curious: why? In all my years of career I've seen engineers take on extra responsibilities and doing anywhere from decent to fantastic job at it, while people who tend to start much more specialized (like QA / sysadmins / managers) I have historically observed struggling more -- obviously there are many and talented exceptions, they just never were the majority, is my anecdotal evidence.
In many situations I'd bet on the engineer becoming a T-shaped employee (wide area of surface-to-decent level of skills + a few where deep expertise exists).
> The PMs and QAs I know would disagree with that assessment.
It just depends on the org structure and what the org calls different skills. In lots of places now PM (as in project, not product) is in no way a leadership role.
QA is still alive and well in many companies, including manual QA. I'm sure there's a wide range these days based on industry and scale, but you simply don't ship certain products without humans manually testing it against specs, especially if its a highly regulated industry.
I also wouldn't be so sure that programming is the hardest of the three roles for someone to learn. Each role requires a different skill set, and plenty of people will naturally be better at or more drawn to only one of those.
From my experience with modern software and services, the actual practice of QA has plainly atrophied.
In my first gig (~30 years ago), QA could hold up a release even if our CTO and President were breathing down their necks, and every SDE bug-hunted hard throughout the programs.
Now QA (if they even exist) are forced to punt thousands of issues and live with inertial debt. Devs are hostile to QA and reject responsibility constantly.
Back to the OP, these things aren't calculable, but they'll kill businesses every time.
Continuous delivery really killed QA.
that's not the role of QA to be a gatekeeper, they give the CTO and President information on the bugs and testing but it's a business decision to ship or not
I’m not a native English speaker, but isn’t gatekeeping exactly that? Blocking suspicious entities unless they’re allowed through by someone higher in the hierarchy?
QA merged originally out of programming.
emerged?
I mean, yes?
Maybe it's different where you live but QA pretty much disappeared a few years ago and project managers never had anything to do with the actual software
Exactly. I think it's been a while since I've read an LLM hot take which couldnt have been written by an LLM and this one is no exception.
There's a 99% chance that the training materials on sale are equally replaceable with a prompt.
True. And yet, as an organization when you buy OP's training, you don't buy the material. You buy the feeling that you make your organization becomes more productive. You buy the signal to your boss that you are innovative and working to make your organization more productive. And you buy the time and headspace from your engineers that they are thinking if at least for 2 hours about making the organization more productive. The latter can be well worth the cost, and the former surely too.
They're buying a defensible (or laudable) justification when the training company's fee appears as a line item in the company budget.
This doesnt mean the training has to be good, useful or original in the slightest but the provider does need to have credentials which arent just "some dev with a hot take" that a fellow executive would recognize.
In general, there’s very little info that costs much to learn nowadays. The human standing in the front is a disciplinarian to force you to learn it.
Or, more likely, a snake oil seller dedicating more to marketing than to the product.
> A messy codebase is still cheaper to send ten agents through than to staff a team around
People who say that haven't used today's agents enough or haven't looked closely at what they produce. The code they write isn't messy at all. It's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. The entire construction is just wrong, hidden behind a nice exterior. And when you need to add a couple more floors, the agents can't "get through it" and neither can people. The codebase is bricked.
Today's agents are simply not capable enough - without very close and labour-intensive human supervision - to produce code that can last through evolution over any substantial period of time.
They can work really well if you put sufficient upfront engineering into your architecture and it's guardrails, such that agents (nor humans) basically can't produce incorrect code in the codebase. If you just let them rip without that, then they require very heavy baby-sitting. With that, they're a serious force-multiplier.
They don't work really well even on relatively small things and even with a virtually impractical upfront engineering: https://news.ycombinator.com/item?id=47752626
They just make a lot of mistakes that compound and they don't identify. They currently need to be very closely supervised if you want the codebase to continue to evolve for any significant amount of time. They do work well when you detect their mistakes and tell them to revert.
Debugging would suffer as well, I assume. There's this old adage that if you write the cleverest code you can, you won't be clever enough to debug it.
There's nothing really stopping agents from writing the cleverest code they can. So my question is, when production goes down, who's debugging it? You don't have 10 days.
> the art is load-bearing
This is beautiful
The problem is, the MBAs running the ship are convinced AI will solve all that with more datacenters. The fact that they talk about gigawatts of compute tells you how delusional they are. Further, the collateral damage this delusion will occur as these models sigmoid their way into agents, and harnesses and expert models and fine tuned derivatives, and cascading manifold intelligent word salad excercises shouldn't be under concerned.
Something is missing in the common test suite if this can occur, right?
First, it's not "can occur" but does occur 100% of the time. Second, sure, it does mean something is missing, but how do you test for "this codebase can withstand at least two years of evolution"?
You have to fight to get agents to write tests in my experience. It can be done, but they don't. I've yet to figure out how get any any agent to use TDD - that is write a test and then verify it fails - once in a while I can get it to write one test that way, but it then writes far more code to make it pass than the test justifies and so is still missing coverage of important edge cases.
I have TDD flow working as a part of my tasks structuring and then task completion. There are separate tasks for making the tests and for implementing. The agent which implements is told to pick up only the first available task, which will be “write tests task”, it reliably does so. I just needed to add how it should mark tests as skipped because it’s been conflicting with quality gates.
You can spend a lot of time perfecting the test suite to meet your specific requirements and needs, but I think that would take quite a while, and at that point, why not just write the code yourself? I think the most viable approach of today's AI is still to let it code and steer it when it makes a decision you don't like, as it goes along.
A lot of that can be overcome by including the need to be able to put more floors on top as part of the spec. Whether it be humans or agents, people rarely specify that one explicitly but treat it as an assumed bit of knowledge.
It goes the other way quite often with people. How often do you see K8s for small projects?
> A lot of that can be overcome by including the need to be able to put more floors on top as part of the spec
I wish it could, but in practice, today's agents just can't do that. About once a week I reach some architectural bifurcation where one path is stable and the other leads to an inevitable total-loss catastrophe from which the codebase will not recover. The agent's success rate (I mostly use Codex with gpt5.4) is about 50-50. No matter what you explain to them, they just make catastrophic mistakes far too often.
This just sounds like incomplete specs to me. And poor testing.
It isn't. Anthropic tried building a fairly simple piece of software (a C compiler) with a full spec, thousands of human-written tests, and a reference implementation - all of which were made available to the agent and the model trained on. It's hard to imagine a better tested, better-specified project, and we're talking about 20KLOC. Their agents worked for two weeks and produced a 100KLOC codebase that was unsalvageable - any fix to one thing broke another [1]. Again, their attempt was to write software that's smaller, better tested, and better specified than virtually any piece of real software and the agents still failed.
Today's agents are simply not capable enough to write evolvable software without close supervision to save them from the catastrophic mistakes they make on their own with alarming frequency.
Specifically, if you look at agent-generated code, it is typically highly defensive, even against bugs in its own code. It establishes an invariant and then writes a contingency in case the invariant doesn't hold. I once asked it to maintain some data structure so that it could avoid a costly loop. It did, but in the same round it added a contingency (that uses the expensive loop) in the code that consumes the data structure in case it maintained it incorrectly.
This makes it very hard for both humans and the agent to find later bugs and know what the invariants are. How do you test for that? You may think you can spec against that, but you can't, because these are code-level invariants, not behavioural invariants. The best you can do is ask the agent to document every code-level invariant it establishes and rely on it. That can work for a while, but after some time there's just too much, and the agent starts ignoring the instructions.
I think that people who believe that agents produce fine-but-messy code without close supervision either don't carefully review the code or abandon the project before it collapses. There's no way people who use agents a lot and supervise them closely believe they can just work on their own.
[1]: https://www.anthropic.com/engineering/building-c-compiler
Lol I largely agree with my beloved dissenters, just not on the same magnitude. I understand complete specs are impossible and equivalent to source code via declaration. My disagreement is with this particular part:
"t's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. "
If your test/design of a BUILDING doesn't include at simulations/approximations of such easy to catch structural flaws, its just bad engineering. Which rhymes a lot with the people that hate AI. By and large, they just don't use it well.
"Incomplete specs" is the way of the world. Even highly engineered projects like buildings have "incomplete specs" because the world is unpredictable and you simply cannot anticipate everything that might come up.
A sufficiently complete spec is indistinguishable from source code.
And sometimes it can't even handle it then. I was recently porting ruby web code to python. Agents were simultaneously surprisingly good (converting ActiveRecord to sqlalchemy ORM) and shockingly, incapably bad.
For example, ruby uses blocks a lot. Ruby blocks are curious little thingies because they are arguably just syntax sugar for a HOF, but man it's great syntax sugar. Python then has "yield" which is simultaneously the same keyword ruby uses for blocks, but works fundamentally differently (instead of just a HOF, it's for generating an iterator/generator) and while there are some decorators that can use yield's ability to "pause" execution in the function to send control flow back out of the function for a moment (@contextmanager) which feels _even more_ like ruby blocks, it's a rather limited trick and requires the decorator to adapt the Generator to a context manager and there's just no good way to generalize that.
Somehow this is the perfect storm to make LLMs completely incapable of converting ruby code that uses blocks for more than the basic iteration used in the stdlib. It will try to port to python code that is either nonsensical, or uses yield incorrectly and doesn't actually work (and in a way that type checkers can even spot). And furthermore, even if you can technically whack it with a hammer until it works with yield, it's often not at all the way to do it. Ruby devs use blocks not-uncommonly while python devs are not really going to be using yield often at all, perhaps outside of @contextmanager. So the right move is usually to just restructure control flow to not need to use blocks/HOFs (or double down and explicitly pass in a function). (Rubyists will cringe at this, and rightly so... Ruby is often extraordinarily expressive).
The fact that such a simple language feature trips them up so completely is pretty odd to me. I guess maybe their training data doesn't include a lot of ruby-to-python conversions. Maybe that's indicative of something, but I digress.
We call the complete specs "source code".
... and it still doesn't work. In the Anthropic experiement, the model was trained on a reference implementation and the agents still failed.
> A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today.
I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
Same here. I have now deleted 43k and counting lines of my codebase. There is no point in putting any AI code into production anymore as it almost always uses none or the wrong abstractions.
When you try to throw more agents at the problem or even more verification layer, you just kill your agility even if they would still be able to work
>I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
This rhymes a lot with the Mythical Man Month. There's some corollary Mythical Machine Month thing going on with agent developed code at the moment.
The more I work with AIs (I build AI harnessing tools), the more I see similarities between the common attention failures that humans make. I forgot this one thing and it fucks everything up, or you just told me but I have too much in my mind as context that I forget that piece, or even in the case of Claude last night attesting to me while I am ordering it around that it cannot SSH into another server but I find it SSHing into said server about the 5th time I come back with traceback and it just fixes it!
All of these things human do, and i don't think we can attribute it directly to language itself, its attention and context and we both have the same issues.
Right, but when humans are writing the code, they have learned to focus on putting downward pressure on the complexity of the system to help mitigate this effect. I don't get the sense that agents have gotten there yet.
Big business LLMs even have the opposite incentive, to churn as many tokens as possible.
this is the part of the article that I did not sit well with me either. Code is agent generated, agent can debug it but will alway be human owned.
unless anthropic tomorrow comes in and takes ownership all the code claude generates, that is not changing..
Very much like humans when they drown in technical debt. I think the idea that a messy codebase can be magically fixed is laughable.
What I might believe though is that agents might make rewrites a lot more easy.
“Now we know what we were trying to build - let’s do it properly this time!”
Potentially, yes, but as with other software, you need to know AND have (automated) verifications on what it does, exactly.
And of course, make the case that it actually needs a rewrite, instead of maintenance. See also second-system effect.
> Potentially, yes, but as with other software, you need to know AND have (automated) verifications on what it does, exactly.
Yes, but even here one needs some oversight.
My experiments with Codex (on Extra High, even) was that a non-zero percentage of the "tests" involved opening the source code (not running it, opening it) and regexing for a bunch of substrings.
>And of course, make the case that it actually needs a rewrite, instead of maintenance.
"The AI said so ..."
I'm wondering how much value there is in a rewrite once you factor in that no one understands the new implementation as well as the old one.
Not only is it difficult to verify, but also the knowledge your team had of your messy codebase is now mostly gone. I would argue there is value in knowing your codebase and that you can't have the same level of understanding with AI generated code vs yours.
The point of a rewrite is to safely delete most of that arcane knowledge required to operate the old system, by reducing the operational complexity of it.
It will make rewrite quicker, not "easier".
When the management recognize a tech debt, often it is too late that nobody understand the full requirement or know how things are supposed to work.
The AI agent will just make the same mistake human would make -- writing some half ass code that almost work but missing all sorts of edge case.
I was involved in a big re-write years ago. The boss finally put the old product on his desk with a sign "[boss's name]'s product owner" - that is when people asked how should this work the most common answer was exactly like the old version. 10 years latter the rewrite is a success, but it cost over a billion dollars. I have long suspected that billion dollars could have been better spend by just fixing technical debt.
> “Now we know what we were trying to build - let’s do it properly this time!”
I wonder if AI will avoid the inevitable pitfalls their human predecessors make in thinking "if I could just rewrite from scratch I'd make a much better version" (only to make a new set of poorly understood trade offs until the real world highlights them aggressively)
That's correct, the more I work with AI the more it's obvious that all the good practice for humans is also beneficial for AI.
More modular code, strong typing, good documentation... Humans are bad at keeping too much in the short-term memory, and AI is even worse with their limited context window.
Is there a case for having more encapsulation? So a class and tests are defined and the LLM only works on that.
Agents run fast. Not always in the right direction. They benefit from a steady hand.
> Software development is one of the most capital-intensive activities a modern company undertakes
The article is definitely written from a "high tech" industry lens. A mid-sized utility might spend $80-$150 million USD on IT capital projects in a year, but $2b on power pole maintenance. Utilities are a strong example, but any large enterprise manufacturing company is spending more on factory upgrades that programming.
> [...] built a functional replica of approximately 95% of Slack’s core product in fourteen days using LLM agents.
IT and Finance leadership and asset heavy companies are currently trying to wrap their head around the current economics of their 100+ SaaS contracts, and if it still makes sense with LLM powered developers. Can they hire developers in house to build the fraction of the tool they use from many of these companies, save on total cost and Opex?
I work with these companies a lot, and won't weigh in on the right decision. Bottom line "it depends" on many factors, some of which are not immediately obvious. The article still holds weight regardless of industries, but there is some nuance (talent availability, internal change cost, etc.) that also have to be considered.
Yeah, that line came across as a little out of touch. I work for US DOTs, and a yearly allotment from a STIP of a small DOT is still measured in billions. Software spend is negligible. In fact, I would say software was always costly in terms of labor, but hasn’t been capital intensive until recently.
But I would like to agree with what you said with respect to SaaS spending coming under scrutiny. Our technical experts are becoming aware that we spend 5 or 6-figure sums on software with barely any users that we can clone with a coding agent in an afternoon. Eventually management will find out too and we’re going to cut a lot of dead weight.
100%.
A modern pharmaceutical manufacturing plant costs two-billion dollars just to build, and that doesn't include developing a drug to actually manufacture there, or a distribution network to sell what you make inside it.
I thought it was a good article, till I saw the Slack example.
The copy doesn’t even remotely grasp the scale of what the actual Slack sofware does in terms of scale, relaiability, observability, monitorability, maintability and pretty sure also functionality.
Author only writes about the non-dev work as difference, which seems like he doesn’t know what he’s talking about in all, and what running an application at that scale actually means.
This "clone" doesn’t get you any closer to an actualy Slack copy than a white piece of paper
I had the same experience (though I agree with other comments that the numbers are a little optimistic in terms of variance; I think there's a huge amount of variance in product work, you can't know what's a good investment until it's too late, many companies fail because of this, and there's huge survivorship bias in the ones that get lucky and don't initially fail). Slack spent tons of money in terms of product and engineering hours finding out what works and what doesn't. It's easy to copy/paste the thing after all that effort. Copy/paste doesn't get you to the next Slack though--it can get you to Microsoft's Slack-killing Teams strategy, but we obviously don't want more of that. And, obviously I agree with you about all the infra/maintenance costs, costs in stewarding API usage and extensions, etc. LLMs won't do any of that for you.
Absolutely, the moment I saw „95% of Slack core functionality” I stopped believing the author knows what he’s talking about
Students in the 2010s were building twitter clones as part of third-year college courses.
And somehow twitter survived and thrived and didn't really get viable competitors until forces external to the code and product itself motivated other investment. And even then it still rolls on, challenged these days, but not by the ease of which a "clone" can be made.
Yeah, I can build a Slack "clone" in a couple of weeks with my own two hands, no AI required. But it's not going to actually be competitive with Slack.
Just to pick an incredibly, unbelievably basic enterprise feature, my two-week Slack clone is not going to properly support legal holds. This requires having a hard override for all deletion and expiration options anywhere in the product, that must work reliably, in order to avoid accidental destruction of evidence during litigation, which comes with potentially catastrophic penalties. If you don't get this right, you don't sell to large corporations.
And there are hundred other features like this. Engineering wants an easy-to-use API for Slack bots. Users want reaction GIFs. You need mobile apps. You need Single Sign-On. And so on. These are all table stakes.
It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.
Believe me, I wish that "simple, clean" reimplementations were actually directly competitive with major products. That version of our industry would be more fun. But anyone who thinks that an LLM can quickly reimplement Slack is an utter fool who has never seriously tried to sell software to actual customers.
> It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.
The other issue is that yes, perhaps most users only use 20% of the features, but each user uses a different 20% of the features in products like Word. Trust me, it's super hard to get it right even at the end-user level, let alone the enterprise level like you say.
There are at most 5% of the features of word that are common to everyone. Things like spell check everyone uses. Actually I suspect it is more like 0.1% of the features are common, and most people use about 0.3% of the features and power users get up to 5% of the features - but I don't have data, just a guess.
Yeah but 98% of Word features were buried in like 2004. They were added when it was a selling point to use unicorn and gnome icons as your table border in under 100mb of RAM. So we’re talking about 20% of the limited set of features that remain not just for backwards compatibility.
And there's some company out there that has very important Word documents that will fail to open if you take away the unicorn and gnome icons table border feature.
When I look at the big non-tech industry companies that have a chill life and print money. It’s usually the companies that are just the very best in what they do and have a quasi monopoly or so much competitive andvantage that everybody is just using them.
That’s whats need in tech too.
A clone doesn’t get you closer to that.
Also, it's obviously faster to copy Slack 1-to-1 than inventing it from scratch. Making Slack was not just coding.
Human slop think-pieces.
I'm not commenting too much on the details of the article, but the premise does resonate with me. I would argue all the engineering teams I've been on do not spend enough time thinking about how much a piece of work will cost to execute, and whether it will generate a return.
I suspect this is most apparent on things like meeting culture. Something happens and all of a sudden there is another recurring meeting on the calendar, with 15 attendee's, costing x dollars in wages, that produces no value for the customers because the lesson was already learned.
Or when reacting to an incident of some sort, it's so easy to have a long list of action items that may theoretically improve the situation, but in reality are incredibly expensive for the value they produce (or the risks they reduce). It's too easy to say, we'll totally redesign the system to avoid said problem. And what worries me, is often those very expansive actions, then cause you to overlook realistic but small investments that move the needle more than you would think.
And as a hot topic I also think the costs are an input into taking on tech debt. I know we all hate tech debt with a passion, but honestly, I think of it as a tool that can be wielded responsibly or irresponsibly. But if we don't know what our attention costs, we're going to have difficulty making the responsible choices about when and where to take on this debt. And then if we're not conscious about the debt, when it comes do it stings so much harder to pay down.
Meetings aren't even the worst resource wasters. Wrong initiatives, features, apps/platforms/services are. They capture future resources in form of maintenance and complexity with them.
Agreed, and this is where I think some more nuanced and conscious use of tech debt can be used when applicable.
It might be OK to place some bets on an initiative or feature, but if we all understand we're placing a bet, this is an area to load up on debt and really minimize the investment. This also requires an org that is mature about cutting the feature if the bet doesn't materialize, and if the market signal is generated will reinvest in paying down the debt. And also has the mega-danger territory of a weak market signal, where it's not clear if there is market signal or not, so the company doubles down into the weak signal.
Also these bets shouldn't be done in isolation in my view, well executed product and market discovery should also provide lots of relevant context on the ROI.
When I see someone just throwing a lot of numbers and graphs at me, I see that there are in to win an argument, and not propose an idea.
Of late, I've come across a lot of ideas from Rory Sutherland and my conclusion from listening to his ideas is that there are some people, who're obsessed with numbers, because to them it's a way to find certainty and win arguments. He calls them "Finance People" (him being a Marketing one). Here's an example
"Finance people don’t really want to make the company money over time. They just thrive on certainty and predictability. They try to make the world resemble their fantasy of perfect certainty, perfect quantification, perfect measurement.
Here’s the problem. A cost is really quantifiable and really visible. And if you cut a cost, it delivers predictable gains almost instantaneously."
> Choosing to spend three weeks on a feature that serves 2% of users is a €60,000 decision.
I'd really want to hire the Oracle of a PM/ Analyst that can give me that 2% accurately even 75% of the time, and promise nothing non-linear can come from an exercise.
As with any attempt to become more precise (see software estimation, eg. Mythical Man Month), we've long argued that we are doing it for the side effects (like breaking problems down into smaller, incremental steps).
So when you know that you are spending €60k to directly benefit small number of your users, and understand that this potentially increases your maintenance burden with up to 10 customer issues a quarter requiring 1 bug fix a month, you will want to make sure you are extracting at least equal value in specified gains, and a lot more in unspecified gains (eg. the fact that this serves your 2% of customers might mean that you'll open up to a market where this was a critical need and suddenly you grow by 25% with 22% [27/125] of your users making use of it).
You can plan for some of this, but ultimately when measuring, a lot of it will be throwing things at the wall to see what sticks according to some half-defined version of "success".
But really you conquer a market by having a deep understanding of a particular problem space, a grand vision of how to solve it, and then actually executing on both. Usually, it needs to be a problem you feel yourself to address it best!
None of his math really checks out. Building a piece of software is or at least was orders of magnitudes more expensive than maintaining it. But how much money it can make is potentially unbounded (until it gets replaced).
So investing e.g. 10 million this year to build a product that produces maybe 2 million ARR will have armortized after 5 years if you can reduce engineering spend to zero. You can also use the same crew to build another product instead and repeat that process over and over again. That's why an engineering team is an asset.
It's also a gamble, if you invest 10 million this year and the product doesn't produce any revenue you lost the bet. You can decide to either bet again or lay everyone off.
It is incredibly hard or maybe even impossible to predict if a product or feature will be successful in driving revenue. So all his math is kinda pointless.
> Building a piece of software is or at least was orders of magnitudes more expensive than maintaining it
This feels ludicrously backwards to me, and also contrary to what I've always seen as established wisdom - that most programming is maintenance. (Type `most programming is maintenance` into Google to find page after page of people advancing this thesis.) I suspect we have different ideas of what constitutes "maintenance".
> that most programming is maintenance.
What do you mean by maintenance?
A strict definition would be "the software is shipping but customers have encountered a bug bad enough that we will fix it". Most work is not of this type.
Most work is "the software is shipping but customers really want some new feature". Let us be clear though, even though it often is counted as maintenance, this is adding more features. If you had decided up front to not ship until all these features were in place it wouldn't change the work at all in most cases (once in a while it would because the new feature doesn't fit cleanly into the original architecture in a way that if you had known in advance you would have used a different architecture)
I like the good ol' "80% of the work in a software project happens before you ship. The other 80% is maintaining what you shipped."
The longer software is sold the more you need to maintain it. In year one most of the cost is making it. Over time other costs start to add up.
As with most things, isn't the truth somewhere in the middle? True cost/value is very hard to calculate, but we could all benefit by trying a bit harder to get closer to it.
It's all too common to frame the tension as binary: bean counters vs pampered artistes. I've seen it many times and it doesn't lead anywhere useful.
Here I think the truth is pretty far to one side. Most engineering teams work at a level of abstraction where revenue attribution is too vague and approximate to produce meaningful numbers. The company shipped 10 major features last quarter and ARR went up $1m across 4 new contracts using all of them; what is the dollar value of Feature #7? Well, each team is going to internally attribute the entire new revenue to themselves, and I don’t know what any other answer could possibly look like.
Even if you could do attribution correctly (I think you can do this partially if you are really diligent about A/B testing), that is still only one input to the equation. The other fact worth considering is the scale factor - if a team develops a widget which has some ARR value today, that same widget has a future ARR value that scales with more product adoption - no additional capital required to capture more marginal value. How do you quantify this? Because it is hard and recursive (knowing how valuable a feature will be in the future means knowing how many users you have in the future which depends on how valuable your features are as well as 100 other factors), we just factor this out and don't attempt to quantify things in dollars and euros.
You’re illustrating one of the points of TFA - a team that is equipped with the right tools to measure feature usage (or reliably correlate it to overall userbase growth, or retention) and hold that against sane guardrail metrics (product and technical) is going to outperform the team that relies on a wizardly individual PM or analyst over the long term making promises over the wall to engineering.
Feature usage can't tell you that.
There's often a checklist of features management has, and meeting that list gets you in the door, but the features often never get used
But surely you have to have at least an hypothesis of how software features you develop will increase revenue or decrease costs if you want to have a sustainable company?
I think the only thing that matters is whether the people on the team care deeply about the product; whether they care more about the product than their own careers (in the short term). Without that, any metric or way of thinking can and will be gamed.
Unfortunately, even with all the management techniques in the world, there are just some projects that are impossible to care about. There’s simply a significantly lower cap on productivity on these projects.
The over-simplification rubs me the wrong way, for example:
There's a fungibility assumption which is pervasive here. In most cases, a platform team is there not "to save time".It's there to deal with cross concerns that would be not only time consuming but could be business threatening, and in some cases, you keep there more expensive engineers that ensure that certain critical things are done right.
Too much snake oil for my taste.
Making it solely about the extraction of dollars is a great recipe to make something mediocre. See Hollywood or Microslop.
Its like min-maxing a Diablo build where you want the quality of the product to be _just_ above the "acceptable" threshold but no higher because that's wasting money. Then, you're free to use all remaining points to spec into revenue.
Exactly. In addition, sometimes a good software "only" makes you save 1% of your time, but that 1% was a terrible burden that induced mental fatigue, made you take bad decisions, etc. It can even make a great Engineer stay when he would have left with the previous version.
While reading the article I was thinking the same thing. I can think of problems I've solved that directly affected 0% of our customers, but overloaded our customer support team.
This is some aggressive consultant fluff. Few companies have such distinctive "profit" measures. If "the financial logic is rarely examined carefully" than maybe there's a reason, since analysis like this is mostly fantastical and brittle. This is the sort of argument that is both rational and implausible. A manager might use this logic to rationalize firing an engineering team (which is mostly why guys like this get hired) but they won't use it to manage an engineering team.
I feel like there is a lot of nuance around this topic that is getting lost in the noise.
The direct and indirect financial impact of technical decisions are indeed hard to measure. But some technical decisions definitely have greater financial impact than others. Even if it's hard to precisely quantify the financial costs/benefits of every decision. It is possible to order them relatively. X is likely to make more money than Y. So we do X first and Y later.
There is a significant amount of chance involved in whether a product/feature will even make money at all. So even good plans with measurably positive expected value could end up losing money.
Just because it's impossible to be 100% certain of the outcome of any decision. Doesn't mean we should throw the baby out with the bathwater.
This article is not bad overall, but it does over-index on the cost of making software development costs and tradeoffs legible. Of course leadership does need to make decisions, and so the quest for better data and better cost modeling will continue, and rightly so, Goodhart's law notwithstanding.
I do like this bit though:
> A large codebase also carries maintenance costs that grow over time as the system becomes more complex, more interconnected, and more difficult to change safely. Every engineer added to maintain it increases coordination costs, introduces new dependencies, and adds to the organizational weight that slows decision-making. The asset and the liability exist simultaneously, and for most of the past twenty years, the financial environment masked the liability side of that equation.
And the insight that LLMs are exposing this reality is absolutely true. The funny thing is they are exposing it by accelerating both good and bad engineering practices. Teams with good engineering judgement will move faster than ever with fewer people, and teams with bad engineering judgment will bury themselves in technical debt so fast the wheels will come off.
For me, running an engineering org is primarily about talent acquisition and empowering those ICs with judgment to move quickly. How well systems and teams scale depends on the domain, product, and how it allows you to decouple things. With the right talent and empowerment there are often creative ways to make product and system tradeoffs and iterate quickly to change the shape of ROI. Any mapping to financial metrics is a hugely lossy operation that can't account for such changes. It might work in mature companies that are ossified and in the second half of their lifecycle, but in growing companies I think it's fundamentally misguided would amount to empowering the wrong people.
Still don’t understand what regular people (like the author) gain from selling how wonderful AI is. I get that the folks at Anthropic and openai shove AI through our throats every day, but nobodies?
He is selling consulting around AI/LLM.
In other words, he's cutting branch he's sitting on.
That would only be a problem if his saw could actually cut wood.
> There is no cohort of senior product leaders who developed their judgment in conditions where their teams were expected to demonstrate financial return, because those conditions did not exist during the years when that cohort was learning the craft.
There totally is such a cohort. There are plenty of bootstrapped companies or startups that took only an angel round and did not benefit from the low rate environment, in fact they suffered because of the very high price of SWE labor. But those engineering managers exist and are out there right now still building efficiently, quietly growing, passionately serving customers, and keeping a close eye on the bottom line and risks because that’s their livelihood.
Look! A guy built 95% of slack in 2 weeks! Very skeptical of that btw, but also an organization that justifies every single team by exactly how much $ value they’re generating sounds like hell. How would you ever innovate or try out new ideas? It’s important to quantify what impact your team is generating but there are some cases (e.g. UX) which are really hard to quantify in $ but are still very important for the product
i think the thing that hits home for me here is that when you go back and do the after action report on where the time was spent last year and what it cost its terrifying. of course hindsight is 20/20 and predicting how difficult something is going to be is hard, but when you say we spent $x million on this version update that does y and $a hundred thousand to implement this feature - you think to yourself we would have never made that cost / benefit decision if we had known.
Does anyone really believe that Slawk is a "replica of approximately 95% of Slack’s core product"??
But in reality, the real cost of engineering teams grow as the sub organizations and teams continue to make short term decisions, optimizing for the next immediate win.
More common so in larger organizations than smaller ones
The points brought up are all great. I'm in a lower management position and I've wondered for a decade why the budget, cost, and return on work (i.e. revenue) were never divulged or connected to the work at hand. So kudos for facing that problem bluntly in language that's easy to follow. The place I'm at currently, its much more about automating away processes and making back office operations easier, so there's likely a lot of direct cost savings that we could measure, but don't.
Here's the problem I see with how this particular article is moving though: the context of these projects are often highly technical connecting back to the human problem space. Developers sit on the technical end but they also usually have a mental model for how it connects back to the non-technical. A product manager is another addition to compensate for the user connection. Between all of these folks they can only hold so much in their head about the problem space on a day-to-day basis. And that headspace for the problem is what is critical. Management wants to try a new idea for sales? They need to take it to the team with that problem space to translate it into working code. Even with the assistance of agents, one needs to hold the important patterns in their head. And my company certainly isn't going to vibe code its way through anything regulatory, mistakes there might cost us a ton in fees and bad PR. Hell I've seen product managers sweat over the possibility of getting a few 1 star reviews on the app store.
Anyway, you still need people with context to break things down and get them out the door, the agents can just assist with the speed of the In Progress stage. And clever teams can figure out how to automate their validation (but they could already do that).
Rockstar developers often seem to be the ones who can parachute in, gain context, make changes, and leave to find another problem space. They get bogged down when they've visited 10 or more problem spaces and then they start getting called back into service. Again the agents don't change any of that, the human involved has a finite capacity for context.
Teams who structure around maintaining context might be best suited for the new world of code.
I do tech dd, exit readiness and post merger integration in tech companies and this is my daily bread. The biggest lever I have: connecting initiatives to ROI/bottom line impact. It's incredible how blind product/software teams run. So much to do but most of it won't make any money and just feels productive. Connecting activities and work directly towards revenue is very important.
If your company runs well: won't hurt you much that you're not doing this. Otherwise this will be your end. And that really hurts because you lose the economical impact of the product and the jobs.
The argument against platform teams needs to be balanced with the compounding nature of technical debt.
The argument to always go for the biggest return works OK for the first few years of high growth (though the timeline is probably greatly compressed the more you use AI), but it turns into a kind of quicksand later.
This is a very reductionist way to calculate the value of a software team or any team within an organization. That’s because many times the value delivered by a team is not necessarily monetary but strategic.
> Most engineers do not know this number.
How could they not? When I penciled this out ~18 years ago, I included the amortized cost of all the interviews it took to hire a given engineer as well. It's not rocket surgery, as they say.
Money can be exchanged for goods and services.
One interesting factor that I rarely see discussed is this: Let's say a DevOps person does some improvement to internal tooling and a task that devs had to oversee manually now is automated. Every dev spent about 2 hours per week doing this task and now they don't have to anymore. Now, have we saved 2 hours of salary per dev per week?
Not sure. Because it totally depends on what they do instead. Are they utilizing two hours more every week now doing meaningful work? Or are they just taking things a bit more easy? Very hard to determine and it just makes it harder to reason about the costs and wins in these cases.
They have saved _more_ than two hours per dev and week. There's a compound factor and now code can be more reliable (less outages or emergencies fixing bugs) etc. Also having a sane working environment helps engineers not quitting, which is very expensive if they are replaced.
The freed-up time question is answerable when the work has clear metrics. A model test suite dropping from 6 minutes to 66 seconds saves developer time on every single run. Ten developers running tests five times a day, the math is straightforward.
The problem is that most engineering work lacks that kind of before/after measurement. Not because it is unmeasurable, but because nobody set up the baseline. Profile before you optimize and the return on investment calculates itself.
If a test suite runs for either 6 minutes or 66 seconds I am not staring at it while it runs. I am doing something else. So that is not holding up my time
If you have no feedback for 6 minutes, it will hold up your time.
In such a clear-cut example, I think we have saved the two hours.
Yes. You work 2 hours less, but what do you produce in those two extra hours? Can you say that your company now spends X dollars less or earns X dollars more? I don't think it can be that clear.
And what is your theory? That it’s better to not save those 2 hours since they will just go to waste anyway? Or that there is diminishing returns to saving work as people will tend to just spend longer on other things they were already doing? How can you be sure those 2 hours will not actually be used by most to do very productive things that in the end look like +4 hours in return??
No. I am not saying that it is a bad idea to do this.
I am saying:
Given you have saved two hours per person per week
Then the value for the company is _not_ equal to two hourly salaries per week. The consequences are just not that simple.
I don't understand the urgency around quantifying every aspect of the software process. Surely, we are in agreement that money in must at least equal money out if the company is to be viable? This is a simple quickbooks report, is it not?
Why don't we instead focus our energies on the customer and then work our way backward into the technology. There are a lot of ways to solve problems these days. But first you want to make sure you are solving the right problem. Whether or not your solution represents a "liability" or an "asset" is irrelevant if the customer doesn't even care about it.
Why don't we instead focus our energies on the user. For some very important software applications the customer is not the user. Let the sales department focus on the customer.
The estimate cost number is for very large companies with massive overhead bulk. Dump the management overhead, the HR machine and other things smaller companies do not have and this number comes down massively.
I’m a little surprised that fundamental concepts like burn rates are not expected to be understood in software. In other professional services, this is often top of mind, at least for managers.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. […] The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Maybe there’s some new paradigm that makes this true. But it doesn’t seem obviously true to me.
Humans make the best code long term when everything orbits a vision of the underlying problem space.
LLMs seem to only consider the deeper problem space when I explicitly flag it for them, otherwise they write “good enough for this situation” type code. And that stack of patches type code is exactly how the code becomes messy and complicated in the first place.
There's a lot here, mixed some marketing and some dubious LLM claims. That being said, I think there could be real benefit in pushing detail on how features effect finances down to individual teams. Right now I have two features on my desk that both seem reasonable; if I knew which one would generate more income (i.e. increase customer retention, lead to more sales, etc.) that would make this choice a lot easier.
The core reason most orgs are "flying blind" is that we still don't have a reliable metric for technical debt. Management only tracks shipped features and velocity because they are easy to measure. They completely ignore the hidden liability of a rushed, messy codebase until productivity eventually grinds to a halt. You can't measure the economics of a team if you ignore the balance sheet.
Time to ship, change failure, rework rates, mean time to resolve, code complexity, code churn, average age of dependencies - there's a ton of reliable metrics for technical debt, but they have to actually be looked at to do any good.
The problem is that technical debt is a more complex concept & thus requires more metrics to properly measure than a simple concept like velocity.
Most orgs don't understand this.
If you've ever been in a meeting with multiple L8's arguing over features, you should be able to estimate how much each hour of that meeting is costing the org.
Thank you so much for putting into words what I have been saying for years to leadership teams.
Everyone wanted to copy Meta/Google/Oracle and have internal teams, and to me internal teams have been accountability vacuums.
People want an internal team so they can go "well if we had better tooling!" when instead they should make best with what they have.
The 3-5x return threshold is the part most eng leaders never internalize. I've seen teams spend entire quarters on internal tooling that saves maybe 20 minutes per developer per week — nowhere near break-even, let alone a healthy return. The uncomfortable truth is that most prioritization frameworks (RICE, WSJF, etc.) deliberately avoid dollar amounts because nobody wants to see the math on their pet project. Once you attach real costs to sprint decisions, half the roadmap becomes indefensible.
On the other hand, I’ve also seen single developers create a tool or dashboard off-the-books that had widespread adoption. Things that would never have breached the top 100 features list since they are entirely internal. The irony is then they are expected to maintain it indefinitely without official effort allocation.
You’re absolutely right, but just to a point. It should be easy to clearly quantify the desired financial outcome of a sprint, but not of its components. I don’t want to spend a single minute figuring out the financial outcome of a single ticket.
> This does not mean that Slack’s engineering investment was wasted, because Slack also built enterprise sales infrastructure, compliance capabilities, data security practices, and organizational resilience that a fourteen-day prototype does not include.
The LLM-agent team argument also misses the core point that the engineering investment (which actually encompasses business decisions, design and much more than just programming) is what actually got Slack (or any other software product) to the point where is it is now and where it's going in the future and creating a snapshot of the current status is, while maybe not absolutely trivial, still just a tiny fraction of the progress made over the years.
With a long time in the industry and seeing how so many big software companies work, this really really chimed with me. Many/most teams and projects and busy work are not actually moving the bottom line, at massive opportunity cost! And there's so little awareness that most people in squads and their managers will think they are the exception.
Whereas Whatsapp with its 30 software engineers was the exception etc.
A chat with friends showed how there are parallels with how LLMs will happen in the short-term future - say the next 5 years - and the whole MapReduce mess. Back when Hadoop came along you built operators and these operators communicated through disk. It took years even after Spark was about for the hadoop userbase as a whole to realise that it is orders of magnitude more efficient to only communicate through disk when two operators are not colocatable on the same machine and that most operators in most pipelines can be fused together.
So for a while LLMs will be in the Hadoop phase where they are acting like junior devs and making more islands that communicate in bigger bloated codebases and then there might be a realisation in about 2030 that actually the LLMs could have been used to clean up and streamline and fuse software and approach the Whatsapp style of business impact.
I've been a software engineer for more than ten years and never cared about these kinds of topics. But lately, I've found them genuinely interesting. Could someone recommend books on the economics of software businesses? I can't take this author's content seriously.
I have been interested in this topic for a long time and to be honest, there is no better book on the topic than The Mythical Man Month. Yes from the 70s I think, but still the best I have read.
Wow that article made a hard right turn about halfway through.
"Most organizations improperly account for engineering teams and incorrectly consider both code and team growth to be assets when in fact they increase complexity..... but LLMs can fix all of this"
Wtf?
Measuring things that actually matter is a great way to improve clarity on a team, you can probably just stop reading this article at the halfway point.
EDIT:
Specifically this paragraph is insane
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
Then let's disregard cost of running and maintaining a system for having exact financial feedback.
We do proxy measurements because having exact data is hard because there is more to any feature than just code.
Feature is not only code, it is also customer training, marketing - feature might be perfectly viable from code perspective but then utterly fail in adoption for reasons beyond of Product Owner control.
What I saw in comments — author is selling his consultancy/coaching and I see in comments that people who have any real world experience are also not buying it.
The "author" used someone's vibecoded Slack clone to justify his conclusions. I think he believes that the majority of Slack's value lies in the slick CSS animations.
I do agree with his thesis in the middle, about how the ZIRP decade and the cultures that were born from that period were outrageous and cannot survive the current era. It's a brave new world, and it's not because of AI. It's because there's just not enough money flowing anymore, and what little is left is sucked up by the big boys (AI).
> A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
What experience is this guy basing this on? My guess is absolutely none at all.
Maybe this will be the case in the future, but as of right now if I cut 10 agents loose for 10 days one of our repos at work and tell them to clean it up but but keep the tests passing, we’d be drowning in support tickets.
Tests don’t cover all observable behavior. Every single production bug we’ve had made it through the test suite.
Also this guy only had a vague idea of how platform engineering teams work in large organizations.
Platform teams are the engineering org’s immune system. They’re how we fight back against the tech debt accumulated by the relentless march of features of the week.
If anything the extra code people are cranking out with AI make them more necessary.
If you want to understand economics, I recommend watching some of Don Reinertsen's videos on Lean 2.0. He goes into a few concepts quite deeply that are quite intuitive.
Cost of delay: calculating the cost of delaying by a few weeks in terms of lost revenue (you aren't shipping whatever it is you are building), total life value of the product (your feature won't be delivering value forever), extra cost in staffing. You can slap a number on it. It doesn't have to be a very accurate number. But it will give you a handle on being mindful that you are delaying the moment where revenue is made and taking on team cost at the cost of other stuff on your backlog.
Option value: calculating the payoff for some feature you add to your software as having a non linear payoff. It costs you n when it doesn't work out and might deliver 10*n in value if it does. Lean 1.0 would have you stay focused and toss out the option for that potential 10x payoff. But if you do a bit of math, there probably is a lot of low hanging fruit that you might want to think about picking because it has a low cost and a potential high payoff. In the same way variability is a good thing because it gives you the option to do something with it later. A little bit of overengineering can buy you a lot of option value. Whereas having tunnel vision and only doing what was asked might opt you out of all that extra value.
A bad estimation is better than no estimation: even if you are off by 3x, at least you'll have a number and you can learn and adapt over time. Getting wildly varying estimates from different people means you have very different ideas about what is being estimated. Do your estimates in time. Because that allows you to slap a dollar value on that time and do some cost calculations. How many product owners do you know that actually do that or even know how to do that?
Don't run teams at 100% capacity. Work piles up in queues and causes delays when teams are pushed hard. The more work you pile on the worse it gets. Worse, teams start cutting corners and take on technical debt in order to clear the queue faster. Any manufacturing plant manager knows not to plan for more than 90% capacity. It doesn't work. You just end up with a lot of unfinished work blocking other work. Most software managers will happily go to 110%. This causes more issues than it solves. Whenever you hear some manager talking about crunch time, they've messed up their planning.
Stretching a team like that will just cause cycle times to increase when you do that. Also, see cost of delay. Queues aren't actually free. If you have a lot of work in progress with inter dependencies, any issues will cause your plans to derail and cause costly delays. It's actually very risky to do that if you think about it like that. If you've ever been on a team that seemingly doesn't get anything done anymore, this might be what is going on.
I like this back of the envelope math; it's hard to argue with.
I used to be a salaried software engineer in a big multinational. None of us had any notion of cost. We were doing stuff that we were paid to do. It probably cost millions. Most decision making did not have $ values on them. I've since been in a few startups. One where we got funded and subsequently ran out of money without ever bringing in meaningful revenue. And another one that I helped bootstrap where I'm getting paid (a little) out of revenue we make. There's a very direct connection between stuff I do and money coming in.
Do you have any recommendations? I find his book Principles of Product Development Flow very interesting.
Measuring a platform team's productivity in pure "hours saved" is missing a huge point: reliability. If your platform prevents even one outage every month, how much business value and capital are saved? That analysis is utterly absent from this article. It also seems to focus on "LLMs make code cheap" which is a half truth: LLMs make (so far) easy or messy code cheap. I'd bet that there too the analysis on reliability/stability is missing from the author's perspective.
I think the article make some great points, however this part is not even wrong:
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
LLMs are not conscious, that means left on their own devices they will drift. I think the single most important issue when working with LLMs is that they write text without a layer that are aware what's actually being written. That state can be present in humans as well, like for example in sleepwalking.
Everyone who's tried to to complete vibe coding a somewhat larger project knows that you only get to a certain level of complexity until the model stops being able to reason about the code effectively. It starts to guess why something is not working and cannot get out of that state until guided by a human.
That is not new state in the field, I believe all programmers has at points in their career come across code that's been written with developers needing to get over a hard deadline with the result of a codebase that cannot effectively be modified.
I think for a certain subsets of programming projects some projects could possibly be vibe coded as in that code can be merged without human understanding. But it has to be very straightforward crud apps. In almost everything else you will get stopped by slop.
I suspect that the future of our profession will shift from writing code to reading code and to apply continuous judgement on architecture working together with LLMs. Its also worth keeping in mind that you cannot assign responsibility to an LLM and most human organization requires that to work.
I get "This site can’t be reached"
You haven't missed much.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
I keep seeing this assumption that "unmanageable" caps out at "kinda hard to reason about", and anyone with experience in large codebases can tell you that's not so. There are software components I own today which require me to routinely explain to junior engineers (and indeed to my own instances of Claude) why their PR is unsound and I won't let them merge it no matter how many tests they add.
Yeah this really breaks down when you put the logic up against ANY sort of compliance testing. Ok you don’t meet compliance, your agents have spent weeks on it and they’re just adding more bugs. Now what are you going to do? You have to go into the code yourself. Uh oh.
> even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today
Citation needed. A human engineer can grok a lot in 10 days, and an agent can spend a lot of tokens in 10 days.
I expect engineering departments to be flattened and reduced in people. Corporate silos of responsibility around apps will probably disappear as a senior developer with tools can be pretty effective across platforms and technologies because the value of architectural and design thinking becomes more valuable
I see we're once again missing the existence of indirect impact. There's a reason organizations look at revenue/engineer overall instead of trying to attribute it directly to specific teams.
I guess his students get to relearn that on their own.
Also, any post talking about building software and then contains the suggestion that "cost per unit" is an efficiency metric needs to come to the red courtesy phone, Taylorism would like to have a chat about times gone by.
>Given that software teams are expensive
In many companies there are 3 to 5 other people per developer (QA, agile masters, PO, PM, BA, marketing, sales, customer support etc.). The costs aren't driven just by the developer salaries.
A CEO can cost as much as 10 developers, sometimes more.
That is why I respect Zuckerberg: he did not participate in Google's and Apple's salary fixing and he is willing to pay new tech hires insane money.
There is something different about CEOs that came from tech.
Yet another essay completely missing the point, and an audience that misses it as well. All these organizations fly blind because nowhere in any technology or science education is there any emphasis on effective communications, conveying understanding, solving disagreements with analysis and the best of both perspectives... none of these critical communication skills are taught to the very people that most need them. It's a wonder our civilization functions at all.
Using ‘blind’ to mean ‘ignorant’ is like using any disability label as a synonym for ‘bad’—it turns a real condition into an insult.
"Flying blind" is a completely standard idiom originating from flying while blinded by e.g. cloud or darkness. Its meaning is a figurative transplant of a literal description.
I know it’s an idiom. The point is that it still uses blindness as a stand-in for incompetence/unsafe guessing. Being common doesn’t make it harmless. Common just means we’ve normalized it. And you defending it shows that weve normalized it to a point where the double-meaning is seemingly only apparent to blind people.
It absolutely does not use blindness as a stand-in for incompetence, that is your own outrage-seeking interpretation of it. A neutral interpretation would be that "flying blind" is to "operate without perfect information". It is a simple description of operating conditions, not a derogatory term in any way. Your reply is worded in such a way as to indicate that you think the person you're replying to deserves to be shamed for 'defending' it, but having a disability does not entitle you to browbeat the world into submission and regulate all usage of any words associated with your disability as you see fit. This is quite benign and people are perfectly well within their right to object to somebody trying to police plainly descriptive language.
Your reply would be much improved if it were just this part.
> A neutral interpretation would be that "flying blind" is to "operate without perfect information". It is a simple description of operating conditions, not a derogatory term in any way.
Entering it would also have put less wear and tear on the input device.
You are equivocating. Blindness as a personal chronic medical condition is not the same as a situational difficulty.
The pilot who is "flying blind" has perfectly normal eyeballs. They are not necessarily a member of any minority group, except for their chosen profession.
_____
As for "blind" being a word that appears more frequently in a negative rather than positive way... Well, I'm not sure what to tell you, that's just 10,000+ years of language from a species that evolved to prefer seeing.
To offer an example of the positive case, the idiom "justice is blind". Yes, there is a popular cultural mascot wearing a strip a fabric over her eyes, but again: The justice doesn't actually involve any (real) personal medical condition, and it's considered a positive feature for the job.
Well flying blind is unsafe guessing (ignoring modern instruments), that's a fact. But only "flying" and "blind" together. No one thinks this makes the word "flying" has a negative connotation here, and same with "blind".
Like "drinking" and "driving". On their own, they're both neutral, but "drinking and driving" is really bad.
No, it means not being able to see what is going on. Which is literally what the word blind means. You can be blinded by many things (blindfold, clouds/fog, bright lights, darkness, accidents, genetics, etc), permanently and temporarily. Non-humans can be blind and blinded. YOU are making it about a specific situation and projecting value judgements on it.
The author specifically says FLYING blind. Not "stumbling around like a blind person" or some such. If you are offended, that is on you. It's your right to be offended of course, but don't expect people to join in your delusion.
Why is "ignorant" a synonym for "bad" (as a moral judgement, like "bad person")?
It just means you don't know something, which is usually a relatively bad situation for you, but it doesn't make you a bad person.
If you think otherwise, that's on you.