I see an objective difference in speed between Team account and my personal max 20 and a subjective difference in quality.
On my personal account projects get 90% and then get stuck on a bug and then I have to hand the project over to codex to fix. Then codex usually is like “this feature was stubbed out and not connected at all”. Like wtf is going on with Claude. It gets so much complicated stuff right and then completely whiffs the main details.
On my teams premium account, I can complete things fine and fast.
I have a similar issue where it ignores step by step instructions. I have a detailed step by step playbook and QA checklist to follow, but it will make up its own checklist with fewer items on it and say it's finished the job! I think about half my time is spent getting the very clever code it has written out of large singular files and into an organised structure which was specced from the beginning.
The distributed systems framing is right but it undersells the problem. In a traditional distributed system, each node runs code a human wrote and can reason about. When something fails, you trace the call chain and find the bug.
With multi-agent development, each agent generates code that no single human fully understands. The failure mode isn't a consensus problem or a network partition. It's a comprehension partition. Five agents each wrote part of the system. None of them hold a mental model of the whole. Neither does the human who orchestrated them. When it breaks, there is no call chain to trace because there was never a unified understanding of what the system was supposed to do at that level of specificity.
Quality has always a component of subjective perception, but the percentage of outages is really undeniable. The code quality, thou, is in my opionion improving, not decreasing. When I think, what I did with Claude 6 months ago, and what I do with it now... Ask someone in the late 90s, how his experience with Windows 1995 changed, not to dare to ask, if it improved... We see a unimaginable fast-paced development compared to anything else ever before imo.
This article is mostly clickbait. Even if there’s an uptick in complaints, that’s likely just a function of more people using Claude, Claude Code, and similar tools.
people have been saying “the model got worse” after almost every major update since early ChatGPT releases. Quality has always been somewhat variable and user dependent, so individual experiences can fluctuate. But it's undeniable that state-of-the-art models have consistently improved with each generation.
What’s really happening is that as models get more widely used, their weaknesses become more visible—and people tend to focus on those rather than the overall progress.
and ok maybe they had some outages lately but that's not really news
Using Claude as a benchmark for its own quality is pretty funny. If we think the quality has declined, wouldn't that also apply to the benchmarking process itself?
Quality is highly subjective which means it will be very easy for these companies to dramatically drop their opex without users being aware. Think of it as an invisible rate limit.
I see an objective difference in speed between Team account and my personal max 20 and a subjective difference in quality.
On my personal account projects get 90% and then get stuck on a bug and then I have to hand the project over to codex to fix. Then codex usually is like “this feature was stubbed out and not connected at all”. Like wtf is going on with Claude. It gets so much complicated stuff right and then completely whiffs the main details.
On my teams premium account, I can complete things fine and fast.
I have a similar issue where it ignores step by step instructions. I have a detailed step by step playbook and QA checklist to follow, but it will make up its own checklist with fewer items on it and say it's finished the job! I think about half my time is spent getting the very clever code it has written out of large singular files and into an organised structure which was specced from the beginning.
The distributed systems framing is right but it undersells the problem. In a traditional distributed system, each node runs code a human wrote and can reason about. When something fails, you trace the call chain and find the bug.
With multi-agent development, each agent generates code that no single human fully understands. The failure mode isn't a consensus problem or a network partition. It's a comprehension partition. Five agents each wrote part of the system. None of them hold a mental model of the whole. Neither does the human who orchestrated them. When it breaks, there is no call chain to trace because there was never a unified understanding of what the system was supposed to do at that level of specificity.
Yeah, just the other day I had asked it to do some work and then merge it into a develop branch.
"Done, merged to develop".
I test, feature not there.
"?" Claude: "Yeah, there's nothing for that feature in develop"
"I'm confused. You said above you merged it into develop." Claude: "I did say that but I didn't do it. Should I do it now?".
Me, thinking, "That depends, will you actually do it now?"
I notice recently they push way too many tokens to explain things than they use to. I have not measured it, but working with it every day I can tell.
Weird thing is, is that for some people Opus 4.6 has been acting incredibly dumb but for me there is no difference at all.
Not sure what is happening at Anthropic atm
[dead]
Isn't it weird to run an analysis scoped to currently open issues? Of course more recent issues will be more likely to be open right now.
Quality has always a component of subjective perception, but the percentage of outages is really undeniable. The code quality, thou, is in my opionion improving, not decreasing. When I think, what I did with Claude 6 months ago, and what I do with it now... Ask someone in the late 90s, how his experience with Windows 1995 changed, not to dare to ask, if it improved... We see a unimaginable fast-paced development compared to anything else ever before imo.
I wonder how much of this is due to actual experienced subject matters refining Claude.
This article is mostly clickbait. Even if there’s an uptick in complaints, that’s likely just a function of more people using Claude, Claude Code, and similar tools.
people have been saying “the model got worse” after almost every major update since early ChatGPT releases. Quality has always been somewhat variable and user dependent, so individual experiences can fluctuate. But it's undeniable that state-of-the-art models have consistently improved with each generation.
What’s really happening is that as models get more widely used, their weaknesses become more visible—and people tend to focus on those rather than the overall progress. and ok maybe they had some outages lately but that's not really news
Using Claude as a benchmark for its own quality is pretty funny. If we think the quality has declined, wouldn't that also apply to the benchmarking process itself?
You'd think so, if its quality has gone down, then its ability to know that is also decreased.
Quality is highly subjective which means it will be very easy for these companies to dramatically drop their opex without users being aware. Think of it as an invisible rate limit.
Have fun with your ruby-Goldberg machines.
[dead]