Amusingly, they deprecated it with a message of "Unpublished" instead of actually unpublishing it [1]. When you use npm unpublish it removes the package version from the registry, when you use npm deprecate it leaves it there and simply marks the package as deprecated with your message. I have to imagine the point was to make it harder for people to download the source map, so to deprecate it with this message gives off a bit of claude, unpublish the latest version of this package for me vibe.
Packages published less than 72 hours ago
For newly created packages, as long as no other packages in the npm Public Registry depend on your package, you can unpublish anytime within the first 72 hours after publishing.
There are 231+ packages that depend on this one, and I imagine they mostly use permissive enough version ranges that this was included.
In all my years of writing tools for other devs, dog fooding is the really the best way to develop IMO. The annoying bugs get squashed out because I get frustrated with it in my flow.
Iterating on a MCP tool while having Claude try to use it has been a really great way of getting it to work how others are going to use it coming in blind.
Yes it's buggy as hell, but as someone echoed earlier if the tool works most of the time, a lot of people don't care. Moving fast and breaking things is the way in an arms race.
The big loss for Anthropic here is how it reveals their product roadmap via feature flags. A big one is their unreleased "assistant mode" with code name kairos.
Just point your agent at this codebase and ask it to find things and you'll find a whole treasure trove of info.
Edit: some other interesting unreleased/hidden features
- The Buddy System: Tamagotchi-style companion creature system with ASCII art sprites
- Undercover mode: Strips ALL Anthropic internal info from commits/PRs for employees on open source contributions
You'll never win this battle, so why waste feelings and energy on it? That's where the internet is headed. There's no magical human verification technology coming to save us.
I can prove all contributions to stagex are by humans because we all belong to a 25 year old web of trust with 5444 endorser keys including most redhat, debian, ubuntu, and fedora maintainers, with all of our own maintainer keys in smartcards we tap to sign every review and commit, and we do background checks on every new maintainer.
I am completely serious. We have always had a working proof of human system called Web of Trust and while everyone loves to hate on PGP (in spite of it using modern ECC crypto these days) it is the only widely deployed spec that solves this problem.
You can prove the commits were signed by a key you once verified. It is your trust in those people which allows you to extend that to “no LLM” usage, but that’s reframing the conversation as one of trust, not human / machine. Which is (charitably) GPs point: stop framing this as machine vs human — assume (“accept”) that all text can be produced by machines and go from there: what now? That’s where your proposal is one solution: strict web of trust. It has pros and cons (barrier to entry for legitimate first timers), but it’s a valid proposal.
All that to say “you’re not disagreeing with the person you’re replying to” lol xD
With 5400+ people I am betting that you have at least one person in your 'web of trust' that no longer deserves that trust.
That's one of the intrinsic problems with webs of trust (and with democracy...), you extend your trust but it does not automatically revoke when the person can no longer be trusted.
You can only prove that all contributions are pushed by those humans, and you can quite explicitly/clearly not prove that those humans didn't use any AI prior to pushing.
It also makes no sense! "Fuck this, it doesn't matter - but I'll happily spend effort communicating that to others, because apparently making others not care about something I don't care about is something I do care about." Wut?!
Well, I say it makes no sense. Alternatively, it makes a lot of sense, and these people actually just wanna destroy everything we hold dear :-(
I guess I could just curl up into fetal position and watch the world go by. But that's no fun. Why not dream big and shoot for the moon with kooky goals like, say, having an underground, community-supported internet where things are falling less to shit?
Belief in inevitability is a choice (except for maybe dying, I guess).
I think you're underestimating the difficulty, even for exact copies of text (which AI mostly isn't doing).
What sort of Orwellian anti-cheat system would prevent copy and paste from working? What sort of law would mandate that? There are elaborate systems preventing people from copying video but they still have an analog hole.
Human verification technology absolutely exists. Give it some time and people who sell ai today are going to shoehorn it everywhere as the solution to the problem they are busy creating now.
Nothing like throwing in the towel before a battle is ever fought. Let's just sigh and wearily march on to our world of AI slop and ever higher bug counts and latency delays while we wait for the five different phone homes and compilations through a billion different LLM's for every silly command.
IDK. I sort of like the idea that now instead of dead internet theory being a joke, that it’ll be a well known fact that a minority of people are not real and there is no point in engaging… I look forward to Social 3… where people have to meet face to face.
Funny story, when I was younger I trained a basic text predictor deep learning model on all my conversations in a group chat I was in, it was surprisingly good at sounding like me and sometimes I'd use it to generate some text to submit to the chat.
I don't see what the value of this would be. Why would I want to automate talking to my friends? If I'm not interested in talking with them, I could simply not do it. It also carries the risk of not actually knowing what was talked about or said, which could come up in real life and lead to issues. If a "friend" started using a bot to talk to me, they would not longer be considered a friend. That would be the end.
It was for fun, to see if it were possible and whether others could detect they were talking to a bot or not, you know, the hacker ethos and all. It's not meant to be taken seriously although looks like these days people unironically have LLM "friends."
Even if it is impossible to win, I am still feeling bad about it.
And at this point it is more about how large space will be usable and how much will be bot-controlled wasteland. I prefer spaces important for me to survive.
I am actively building non-magical human verification technology that doesn't require you uploading your retinal scans or ID to billionaires or incompetent outsourcing firms.
We already have it and we use it to validate the trusted human maintainer involvement behind the linux packages that power the entire internet: PGP Web Of Trust. Still works as designed and I still go to keysigning parties in person.
Say a regular human wanted to join and prove their humanhood status (expanding the web of trust). How would they go about that? What is the theoretical ceiling on the rate of expansion of this implementation?
Not parent poster but I am a maintainer of software powering significant portions of the internet and prove my humanity with a 16 year old PGP key with thousands of transitive trust signatures formed through mostly in-person meetings, using IETF standards and keychain smartcards, as is the case for everyone I work with.
But, I do not have an Android or iOS device as I do not use proprietary software, so a smartphone based solution would not work for me.
Why re-invent the wheel? Invest in making PGP easier and keep the decades of trust building going anchoring humans to a web of trust that long predates human-impersonation-capable AI.
I assume we're heading to a place where keyboards will all have biometric sensors on every key and measure weight fluctuations in keystrokes, actually.
But will this be released as a feature? For me it seems like it's an Anthropic internal tool to secretly contribute to public repositories to test new models etc.
All these companies use AIs for writing these prompts.
But AI aren't actually very good at writing prompts imo. Like they are superficially good in that they seem to produce lots of vaguely accurate and specific text. And you would hope the specificity would mean it's good.
But they sort of don't capture intent very well. Nor do they seem to understand the failure modes of AI. The "-- describe only what the code change does" is a good example. This is specifc but it also distinctly seems like someone who doesn't actually understand what makes AI writing obvious.
Hey LLM, write me a system prompt that will avoid the common AI 'tells' or other idiosyncrasies that make it obvious that text or code output was generated by an AI/LLM. Use the referenced Wikipedia article as a must-avoid list, but do not consider it exhaustive. Add any derivations or modifications to these rules to catch 'likely' signals as well.
All the prompts I've ever written with Claude have always worked fine the first time. Only revised if the actual purpose changes, I left something out, etc. But also I tend to only write prompts as part of a larger session, usually near the end, so there's lots of context available to help with the writing.
I wager that "describe only what the code change does" was someone's attempt to invert "don't add the extra crap you often try to write", not some 4d chess instruction that makes claude larp like a human writing a crappy commit message.
Yes, this is a trend I've noticed strongly with Claude code—it really struggles to explain why. Especially in PR descriptions, it has a strong bias to just summarize the commits and not explain at all why the PR exists.
This is my pet peeve with LLMs, they almost always fails to write like a normal human would. Mentioning logs, or other meta-things which is not at all interesting.
I had a problem to fix and one not only mentioned these "logs", but went on about things like "config", "tests", and a bunch of other unimportant nonsense words. It even went on to point me towards the "manual". Totally robotic monstrosity.
Heh, this is what people who are hostile against AI-generated contributions get. I always figured it'd happen soon enough, and here it is in the wild. Who knows where else it's happening...
1) This seems to be for strictly Antrophic interal tooling
2) It does not "pretend to be human" it is instructed to "Write commit messages as a human developer would — describe only what the code change does."
Since when "describe only what the code change does" is pretending to be human?
You guys are just mining for things to moan about at this point.
1) It's not clear to me that this is only for internal tooling, as opposed to publishing commits on public GitHub repos. 2) Yes, it does explicitly say to pretend to be a human. From the link on my post:
> NEVER include in commit messages or PR descriptions:
> [...]
> - The phrase "Claude Code" or any mention that you are an AI
How so? Good bit of my global claude.md is dedicated to fighting the incessant attribution in git commits. It is on the same level as the "sent from my iphone" signature - I'm not okay with my commits being advertising board for anthropic.
lol that's funny, I have been working seriously [1] on a feature like this after first writing about it jokingly [2] earlier this year.
The joke was the assistant is a cat who is constantly sabotaging you, and you have to take care of it like a gacha pet.
The seriousness though is that actually, disembodied intelligences are weird, so giving them a face and a body and emotions is a natural thing, and we already see that with various AI mascots and characters coming into existence.
You know, that would actually be pretty fun and cool. Like if you had home automation set up with a "pet assistant", but it would only follow your commands if you made sure to keep it happy.
Not necessarily; I would very much like to use those features on a Linux server. Currently the Anthropic implementation forces a desktop (or worse, a laptop) to be turned on instead of working headless as far as I understand it.
I’ll give clappie a go, love the theme for the landing page!
Clappie looks much more fabulous than CC though. I'll have to give it a try. I like how you put the requests straight into an already running CC session instead of calling `claude -p` every time like the claws.
> not sure why anthropic doesn’t just make their cli open source
They don't want everyone to see how poorly it's implemented and that the whole thing is a big fragile mess riddled with bugs. That's my experience anyway.
For instance, just recently their little CLI -> browser oauth login flow was generating malformed URLs and URLs pointing to a localhost port instead of their real website.
I don't think you really need to look at the source code to understand that it's probably been, let's say, written with a heavy help from Claude itself
The obfuscation point is fair, but you're underestimating the special sauce.
In my expereince the difference is noticeable — it's not just a wrapper. The value is model-CLI co-design: tool use, long context, multi-step reasoning tuned at the model level. Competitors can clone the CLI; they can't clone that feedback loop.
A few months of compounding market share (enterprise stickiness, dev habits, usage data improving the models) can be decisive. By the time others catch up, Anthropic may be two generations ahead.
It's not hard to find them, they are in clear text in the binary, you can search for known ones with grep and find the rest nearby. You could even replace them inplace (but now its configurable).
It seems human. It taught me 合影, which seems to be Chinese slang for just wanting to be in the comments. Probably not a coincidence that it's after work time in China.
Really interesting to see Github turn into 4chan for a minute, like GH anons rolling for trips.
Random aside: I've seen a 2015 game be accused of AI slop on Steam because it used a similar concept... And mind you, there's probably thousands of games that do this.
First it was punctuation and grammar, then linguistic coherence, and now it's tiny bits of whimsy that are falling victim to AI accusations. Good fucking grief
To me, this is a sign of just how much regular people do not want AI. This is worse than crypto and metaverse before it. Crypto, people could ignore and the dumb ape pictures helped you figure out who to avoid. Metaverse, some folks even still enjoyed VR and AR without the digital real estate bullshit. And neither got shoved down your throat in everyday, mundane things like writing a paper in Word or trying to deal with your auto mechanic.
But AI is causing such visceral reactions that it's bleeding into other areas. People are so averse to AI they don't mind a few false positives.
It's how people resisted CGI back in the day. What people dislike is low quality. There is a loud subset who are really against it on principle like we also have people who insist on analog music but regular people are much more practical but they don't post about this all day on the internet.
perhaps one important detail is that cassette tape guys and Lucasfilm aren’t/weren’t demanding a complete and total restructuring of the economy and society
An excellent observation. When films became digital the real backlash came when they stopped distributing film for the old film projectors and every movie theaters had to invest in a very expensive DCP projectors. Some couldn’t and were forced to shut down.
If I had lost my local movie theater because of digital film, I would have a really good reason to hate the technology, even though the blame is on the studios forcing that technology on everyone.
It is not. People resisted bad CGI. During the advent of CGI people celebrated the masterpiece of the Matrix and even Titanic. They hated however the Scorpion King.
Not just in the obvious ways either, even good CGI has been detrimental to the film (and TV) making process.
I was watching some behind the scenes footage from something recently, and the thing that struck me most was just how they wouldn't bother with the location shoot now and just green-screen it all for the convenience.
Even good CGI is changing not just how films are made, but what kinds of films get shot and what kind of stories get told.
Regardless of the quality of the output, there's a creativeness in film-making that is lost as CGI gets better and cheaper to do.
Not the same. The more effort you put into CGI the more invisible it becomes. But you can’t prompt your way out of hallucinations and other AI artifacts. AI is a completely different technology from CGI. There is no equivalence between them.
No there is a very loud minority of users who are very anti AI that hate on anything that is even remotely connected to AI and let everyone know with false claims. See the game Expedition 33 for example.
IMO it's a combination of long-running paranoia about cost-cutting and quality, and a sort of performative allegiance to artists working in the industry.
And yet, no game has problems selling due to these reactions. As a matter of fact, the vast majority of people can't even tell if AI has been used here or there unless told.
I reckon it's just drama paraded by gaming "journalists" and not much else. You will find people expressing concern on Reddit or Bluesky, but ultimately it doesn't matter.
The difference in response time - especially versus a regex running locally - is really difficult to express to someone who hasn't made much use of LLM calls in their natural language projects.
Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.
Now personally I would have maybe made a call through a "traditional" ML widget (scikit, numpy, spaCy, fastText, sentence-transformer, etc) but - for me anyway - that whole entire stack is Python. Transpiling all that to TS might be a maintenance burden I don't particularly feel like taking on. And on client facing code I'm not really sure it's even possible.
So, think of it as a business man: You don't really care if your customers swear or whatever, but you know that it'll generate bad headlines. So you gotta do something. Just like a door lock isn't designed for a master criminal, you don't need to design your filter for some master swearer; no, you design it good enough that it gives the impression that further tries are futile.
So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.
The up-side of the US market is (almost) everyone there speaks English. The down side is, that includes all the well-networked pearl-clutchers. Europe (including France) will have the same people, but it's harder to coordinate a network of pearl-clutching between some saying "Il faut protéger nos enfants de cette vulgarité!" and others saying "Η τηλεόραση και τα μέσα ενημέρωσης διαστρεβλώνουν τις αξίες μας!" even when they care about the exact same media.
For headlines, that's enough.
For what's behind the pearl-clutching, for what leads to the headlines pandering to them being worth writing, I agree with everyone else on this thread saying a simple word list is weird and probably pointless. Not just for false-negatives, but also false-positives: the Latin influence on many European languages leads to one very big politically-incorrect-in-the-USA problem for all the EU products talking about anything "black" (which includes what's printed on some brands of dark chocolate, one of which I saw in Hungary even though Hungarian isn't a Latin language but an Ugric language and only takes influences from Latin).
I just went through quite an adventure trying to translate back and forth from/to Hungarian to/from different languages to figure out which Hungarian word you meant, and arrived at the conclusion that this language is encrypted against human comprehension.
dark chocolate is "étcsokoládé" literally edible-chocolate in Hungarian.
i heared the throat-cleaning "Negró" candy (marketed by a chimney sweeper man with soot-covered face) was usually which hurt English-speaking people's self-deprecating sensitivities.
If it’s good enough it’s good enough, but just like there are many more options than going full blown LLM or just use a regex there are more options than transpile a massive Python stack to TS or give up.
Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick.
You can literally | together every street address or other string you want to match in a giant disjunction, and then run a DFA/NFA minimization over that to get it down to a reasonable size. Maybe there are some fast regex simplification algorithms as well, but working directly with the finite automata has decades of research and probably can be more fully optimized.
I think it will depend on the language. There are a few non-latin languages where a simple word search likely won't be enough for a regex to properly apply.
We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :).
I talk to it in non-English. But have rules to have everything in code and documentation in english. Only speaking with me should use my native language. Why would that be a problem?
In my experience these models work fine using another language, if it’s a widely spoken one. For example, sometimes I prompt in Spanish, just to practice. It doesn’t seem to
affect the quality of code generation.
It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language.
So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English.
I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better.
I saw a curious post recently that explored this idea, and showed that it isn’t really the case. The internal layers of the model aren’t really reasoning in English, or in any human language.
Translation in/out of human languages only happens at the edges of the model.
Internal layer activations for the same concept are similar regardless of language, while activations at the top/bottom layers diverge. Meanwhile the pattern is reversed for same language different content.
So we do at least agree on the fact that quality of human language <-> embeddings transition depends on how good target language is represented in the training dataset?
Even if it happens at the edge, on every conversation turn, I may assume non captured small subtleties of meaning over time can accumulate into significant error.
The agent may be translating to English behind the scenes. There are all kinds of hacks layered on top of the underlying models. But however exactly it works, it’s clearly not the case that these things only make use of Spanish training data when you prompt them in Spanish.
Lets take a language with orders of magnitude less training data, like Nahuatl. The following is what happens if I ask Claude Code in Nahuatl to give me an example of a simple program that I can execute inside a web browser (avoiding any direct keyword prompts like 'Javascript' or 'Hello World'). Interestingly, it responds in Spanish. I doubt that it has any relevant training data in Nahuatl. (Apologies if the Nahuatl is wrong – I just used Google translate.)
¿Uelis techmakas se neskayotl tlen se programa tlen amo ouij tlen ueliskia nijchiuas ipan se navegador web?
-----
Entiendo tu pregunta en náhuatl. Aquí un ejemplo sencillo — una calculadora en HTML puro que puedes abrir directo en el navegador, sin servidor:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Calculadora</title>
<style>
body { font-family: sans-serif; display: flex; justify-content: center; padding: 40px; }
.calc { border: 1px solid #ccc; padding: 20px; border-radius: 8px; width: 200px; }
input { width: 100%; margin-bottom: 10px; font-size: 1.2em; text-align: right; }
button { width: 22%; margin: 2px; padding: 8px; font-size: 1em; cursor: pointer; }
.wide { width: 47%; }
</style>
</head>
<body>
<div class="calc">
<input type="text" id="pantalla" readonly>
<div>
<button onclick="tecla('7')">7</button>
<button onclick="tecla('8')">8</button>
<button onclick="tecla('9')">9</button>
<button onclick="tecla('/')">÷</button>
<button onclick="tecla('4')">4</button>
<button onclick="tecla('5')">5</button>
<button onclick="tecla('6')">6</button>
<button onclick="tecla('*')">×</button>
<button onclick="tecla('1')">1</button>
<button onclick="tecla('2')">2</button>
<button onclick="tecla('3')">3</button>
<button onclick="tecla('-')">−</button>
<button onclick="tecla('0')">0</button>
<button onclick="tecla('.')">.</button>
<button onclick="tecla('+')">+</button>
<button onclick="limpiar()">C</button>
<button class="wide" onclick="calcular()">=</button>
<button class="wide" onclick="borrar()">⌫</button>
</div>
</div>
<script>
const pantalla = document.getElementById('pantalla');
function tecla(val) {
pantalla.value += val;
}
function calcular() {
try {
pantalla.value = eval(pantalla.value);
} catch {
pantalla.value = 'Error';
}
}
function limpiar() { pantalla.value = ''; }
function borrar() { pantalla.value = pantalla.value.slice(0, -1); }
</script>
</body>
</html>
Guarda esto como calculadora.html y ábrelo en cualquier navegador — no necesita servidor ni dependencias. Es un buen punto de partida para aprender HTML,
CSS y JavaScript.
> it’s clearly not the case that these things only make use of Spanish training data when you prompt them in Spanish.
It’s not! And I’ve never said that.
Anyways, I’m not even sure what we are arguing about, as it’s 100% fact that SOTA models perform better in English, the only interesting question here how much better, is it negligible or actually makes a difference in real world use-cases.
It’s negligible as far as I can tell. If the LLM can “speak” the language well then you can prompt it in that language and get more or less the same results as in English.
Thank you. +1.
There are obviously differences and things getting lost or slightly misaligned in the latent space, and these do cause degradation in reasoning quality, but the decline is very small in high resource languages.
In my experience agents tend to (counterintuitively) perform better when the business language is not English / does not match the code's language. I'm assuming the increased attention mitigates the higher "cognitive" load.
Why do you need to do it at the client side? You are leaking so much information on the client side.
And considering the speed of Claude code, if you really want to do on the client side, a few seconds won't be a big deal.
Depends what its used by, if I recall theres an `/insights` command/skill built in whatever you want to call it that generates a HTML file. I believe it gives you stats on when you're frustrated with it and (useless) suggestions on how to "use claude better".
Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.
This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.
It is not a tricky problem because it has a simple and obvious solution: do not filter or block usage just because the input includes a word like "gun".
It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.
75-80% [1], 90%, 99% [2]. In other words, no one has any idea.
I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.
Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P
Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly
They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.
Ha! Where I'm from a "dolly" was the two-wheeled thing. The four-wheeler thing wasn't common before big-boxes took over the hardware business, but I think my dad would have called it a "cart", maybe a "hand-cart".
Cloud hosted call centers using LLMs is one of my specialties. While I use an LLM for more nuanced sentiment analysis, I definitely use a list of keywords as a first level filter.
Actually, this could be a case where its useful. Even it only catches half the complaints, that's still a lot of data, far more than ordinary telemetry used to collect.
You have a semi expensive process. But you want to keep particular known context out. So a quick and dirty search just in front of the expensive process. So instead of 'figure sentiment (20seconds)'. You have 'quick check sentiment (<1sec)' then do the 'figure sentiment v2 (5seconds)'. Now if it is just pure regex then your analogy would hold up just fine.
I could see me totally making a design choice like that.
> The issue is that you shouldn't be looking for substrings in the first place.
Why? They clearly just want to log conversations that are likely to display extreme user frustration with minimal overhead. They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.
It's fast, but it'll miss a ton of cases. This feels like it would be better served by a prompt instruction, or an additional tiny neural network.
And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.
I swear this whole thread about regexes is just fake rage at something, and I bet it'd be reversed had they used something heavier (omg, look they're using an LLM call where a simple regex would have worked, lul)...
The pattern only matches if both ends are word boundaries. So "diffs" won't match, but "Oh, ffs!" will. It's also why they had to use the pattern "shit(ty|tiest)" instead of just "shit".
We used this in 2011 at the startup I worked for. 20 positive and 20 negative words was good enough to sell Twitter "sentiment analysis" to companies like Apple, Bentley, etc...
I don't know about avoided, this kind of represents the WTF per minute code quality measurement. When I write WTF as a response to Claude, I would actually love if an Antrhopic engineer would take a look at what mess Claude has created.
This leak just contributed to a new former customer, me. Flagging these phrases may explain exactly why I noticed cc almost immediatly change into grok lvl shit and never recover. Seriously wtf. (flagged again lol)
I was thinking the opposite. Using those words might be the best way to provide feedback that actually gets considered.
I've been wondering if all of these companies have some system for flagging upset responses. Those cases seem like they are far more likely than average to point to weaknesses in the model and/or potentially dangerous situations.
That's undoubtedly to detect frustration signals, a useful metric/signal for UX. The UI equivalent is the user shaking their mouse around or clicking really fast.
// Match "continue" only if it's the entire prompt
if (lowerInput === 'continue') {
return true
}
When it runs into an error, I sometimes tell it "Continue", but sometimes I give it some extra information. Or I put a period behind it. That clearly doesn't give the same behaviour.
It actually works really well if you suck up to the AI.
"Please do x"
"Thank you, that works great! Please do y now."
"You're so smart!"
lol. It really works though! At least in my experience, Claude gets almost hostile or "annoyed" when I'm not nice enough to it. And I swear it purposefully acts like a "malicious genie" when I'm not nice enough. "It works, exactly like you requested, but what you requested is stupid. Let me show you how stupid you are."
But, when I'm nice, it is way more open, like "Are you sure you really want to do X? You probably want X+Y."
That looks a bit bare minimum, not the use of regex but rather that it's a single line with a few dozen words. You'd think they'd have a more comprehensive list somewhere and assemble or iterate the regex checks as needed.
everyone here is commenting how odd it looks to use a regexp for sentiment analysis, but it depends what they're trying to do.
It could be used as a feedback when they do A/B test and they can compare which version of the model is getting more insult than the other. It doesn't matter if the list is exhaustive or even sane, what matters is how you compare it to the other.
oh I hope they really are paying attention. Even though I'm 100% aware that claude is a clanker, sometimes it just exhibits the most bizarre behavior that it triggers my lizard brain to react to it. That experience troubles me so much that I've mostly stopped using claude code. Claude won't even semi-reliably follow its own policies, sometimes even immediately after you confirm it knows about them.
There is no „stupid” I often write „(this is stupid|are you stupid) fix this”.
And Claude was having in chain of though „user is frustrated” and I wrote to it I am not frustrated just testing prompt optimization where acting like one is frustrated should yield better results.
Not really. Most of the times it actually finally picks up on what I was telling it to do. Sometimes it takes a few tries, like 2-3 wtfs. I don’t think I’ve ever given it more than 3 consecutive wtfs, and that would be a lot
It’s about a once a week or less event. A bit annoying sometimes, but not a deal breaker
I find when you give harsh feedback to claude it becomes "neurotic" and worthless, if "wtf" enters the chat, then you know it's time to restart or DIY.
I doubt they're all classified the same. I'd guess they're using this regex as a litmus test to check if something should be submitted at all, they can then do deeper analysis offline after the fact.
I know I used this word two days ago when I went through three rounds of an agent telling me that it fixed three things without actually changing them.
I think starting a new session and telling it that the previous agent's work / state was terrible (so explain what happened) is pretty unremarkable. It's certainly not saying "fuck you". I think this is a little silly.
i wish that's for their logging/alert. i definitely gauge model's performance by how much those words i type when i'm frustrated in driving claude code.
This is the single worst function in the codebase by every metric:
- 3,167 lines long (the file itself is 5,594 lines)
- 12 levels of nesting at its deepest
- ~486 branch points of cyclomatic complexity
- 12 parameters + an options object with 16 sub-properties
- Defines 21 inner functions and closures
- Handles: agent run loop, SIGINT, rate-limits, AWS auth, MCP lifecycle, plugin install/refresh, worktree bridging, team-lead polling (while(true) inside), control message dispatch (dozens of types), model switching, turn interruption
recovery, and more
I'm sure this is no surprise to anyone who has used CC for a while. This is the source of so many bugs. I would say "open bugs" but Anthropic auto-closes bugs that don't have movement on them in like 60 days.
> This should be at minimum 8–10 separate modules.
Can't really say that for sure. The way humans structure code isn't some ideal best possible state of computer code, it's the ideal organization of computer code for human coders.
Nesting and cyclomatic complexity are indicators ("code smells"). They aren't guaranteed to lead to worse outcomes. If you have a function with 12 levels of nesting, but in each nest the first line is 'return true', you actually have 1 branch. If 2 of your 486 branch points are hit 99.999% of the time, the code is pretty dang efficient. You can't tell for sure if a design is actually good or bad until you run it a lot.
One thing we know for sure is LLMs write code differently than we do. They'll catch incredibly hard bugs while making beginner mistakes. I think we need a whole new way of analyzing their code. Our human programming rules are qualitative because it's too hard to prove if an average program does what we want. I think we need a new way to judge LLM code.
The worst outcome I can imagine would be forcing them to code exactly like we do. It just reinforces our own biases, and puts in the same bugs that we do. Vibe coding is a new paradigm, done by a new kind of intelligence. As we learn how to use it effectively, we should let the process of what works develop naturally. Evolution rather than intelligent design.
I don't buy this. Claude doesn't usually have any issues understanding my code. It has tons of issues understanding its code.
The difference between my code and Claude's code is that when my code is getting too complex to fit in my head, I stop and refactor it, since for me understanding the code is a prerequisite for writing code.
Claude, on the other hand, will simply keep generating code well past the point when it has lost comprehension. I have to stop, revert, and tell it to do it again with a new prompt.
If anything, Claude has a greater need for structure than me since the entire task has to fit in the relatively small context window.
> One thing we know for sure is LLMs write code differently than we do.
Kind of. One thing we do know for certain is that LLMs degrade in performance with context length. You will undoubtedly get worse results if the LLM has to reason through long functions and high LOC files. You might get to a working state eventually, but only after burning many more tokens than if given the right amount of context.
> The worst outcome I can imagine would be forcing them to code exactly like we do.
You're treating "code smells" like cyclomatic complexity as something that is stylistic preference, but these best practices are backed by research. They became popular because teams across the industry analyzed code responsible for bugs/SEVs, and all found high correlation between these metrics and shipping defects.
Yes, coding standards should evolve, but... that's not saying anything new. We've been iterating on them for decades now.
I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to. It might be good to question who stands to benefit when LLMs aren't leveraged efficiently.
I’ve heard this take before, but if you’ve spent any time with llm’s I don’t understand how your take can be: “I should just let this thing that makes mistakes all the time and seems oblivious to the complexity it’s creating because it only observes small snippets out of context make it’s own decisions about architecture, this is just how it does things and I shouldn’t question it.”
I think this view assumes no human will/should ever read the code. This is considered bad practice because someone else will not understand the code as well whether written by a human or agent. Unless 0% human oversight is needed anymore agents should still code like us.
the claude code team ethos, as far as i’ve been lead to understand— which i agree with, mind you— is that there is no point in code-reviewing ai-generated code… simply update your spec(s) and regenerate. it is just a completely different way of interacting with the world. but it clearly works for them, so people throwing up their hands should at least take notice of the fact that they are absolutely not competing with traditional code along traditional lines. it may be sucky aesthetically, but they have proven from their velocity that it can be extremely effective. welcome to the New World Order, my friend.
There's a reputational filtering that happens when using dependencies. Stars, downloads, last release, who the developer is, etc.
Yeah we get supply chain attacks (like the axios thing today) with dependencies, but on the whole I think this is much safer than YOLO git-push-force-origin-main-ing some vibe-coded trash that nobody has ever run before.
I also think this isn't really true for the FAANGs, who ostensibly vendor and heavily review many of their dependencies because of the potential impacts they face from them being wrong. For us small potatoes I think "reviewing the code in your repository" is a common sense quality check.
Is this a serious question? If you are handling sensitive information how do you confirm your application is secure and won't leak or expose information to people who shouldn't know it?
Exactly.... -> Unit tests. Integration tests. UI tests. This is how code should be verified no matter the author. Just today I told my team we should not be reading every line of LLM code. Understand the pattern. Read the interesting / complex parts. Read the tests.
While the technology is young, bugs are to be expected, but I'm curious what happens when their competitors' mature their product, clean up the bugs and stabilize it, while Claude is still kept in this trap where a certain number of bugs and issues are just a constant fixture due to vibe coding. But hey, maybe they really do achieve AGI and get over the limitations of vibe coding without human involvement.
Unit testing is much much harder when you have functions spanning thousands of lines and no abstractions. You have to white box test everything to ensure that you hit all code paths, and it is much more expensive to maintain such tests, both as a human and LLM. I don't think this can be ignored just because LLMs are writing the code.
I agree the functions in a file should probably be reasonably-sized.
It's also interesting to note that due to the way round-tripping tool-calls work, splitting code up into multiple files is counter-productive. You're better off with a single large file.
Im not sure that Humans are great at this either. Think about how we use frameworks and have complex supply chains... we sort of get "good enough" at what we need to do and pray a lot that everything else keeps working and that our tooling (things like artifactory) save us from supply chain attacks. Or we just run piles of old, outdated code because "it works". I cant tell you how many micro services I have seen that are "just fine" but no one in the current org has ever read a line of what's in them, and the people who wrote them left ages ago.
> clarity too
Yes, but define clarity!
I recently had the pleasure of fixing a chunk of code that was part of a data pipeline. It was an If/elseif/elseif structure... where the final two states were fairly benign and would have been applicable in 99 percent of cases. Everything else was to deal with the edge cases!
I had an idea of where the issue was, but I didn't understand how the code ended up in the state it was in... Blame -> find the commit message (references ticket) -> find the Jira ticket (references sales force) -> find the original customer issue in salesforce, read through the whole exchange there.
A two line comment could have spared me all that work, to get to what amounted to a dead simple fix. The code was absolutely clear, but without the "why" portion of the context I likely would have created some sort of regression, that would have passed the good enough testing that was there.
I re-wrote a portion of the code (expanding variable names) - that code is now less "scannable" and more "readable" (different types of clarity). Dropped in comments: a few sentences of explaining, and references to the tickets. Went and updated tests, with similar notes.
Meanwhile, elsewhere (other code base, other company), that same chain is broken... the "bug tracking system" that is referenced in the commit messages there no longer exists.
I have a friend who, every time he updates his dev env, he calls me to report that he "had to go update the wiki again!" Because someone made a change and told every one in a slack message. Here is yet another vast repository of degrading, unsearchable and unusable tribal knowledge embedded in so many organizations out there.
Don't even get me started on the project descriptions/goals/tasks that amount to pantomime a post-it notes, absent of any sort of genuine description.
Lack of clarity is very much also a lack of "context" in situ problem.
Ye I honestly don't understand his comment. Is it bad code writing? Pre 2026? Sure. In 2026. Nope. Is it going to be a headache for some poor person on oncall? Yes. But then again are you "supposed" to go through every single line in 2026? Again no. I hate it. But the world is changing and till the bubble pops this is the new norm
My first word was litteraly "Yes", so I agree that a function like this is a maintenance nightmare for a human.
And, sure, the code might not be "optimized" for the LLM, or token efficiency.
However, to try and make my point clearer: it's been reported that anthropic has "some developpers won't don't write code" [1].
I have no inside knowledge, but it's possible, by extension, to assume that some parts of their own codebase are "maintained" mostly by LLMs themselves.
If you push this extension, then, the code that is generated only has to be "readable" to:
* the next LLM that'll have to touch it
* the compiler / interpreter that is going to compile / run it.
In a sense (and I know this is a stretch, and I don't want to overdo the analogy), are we, here, judging a program quality by reading something more akin to "the x86 asm outputed by the compiler", rather than the "source code" - which in this case, is "english prompts", hidden somewhere in the claude code session of a developper ?
Just speculating, obviously. My org is still very much more cautious, and mandating people to have the same standard for code generated by LLM as for code generated by human ; and I agree with that.
I would _not_ want to debug the function described by the commentor.
So I'm still very much on the "claude as a very fast text editor" side, but is it unreasonnable to assume that anthropic might be further on the "claude as a compiler for english" side ?
The irony of a source code leak via a map file is that it's such a classic mistake — map files are great for debugging but should never ship to production in a commercial product. Curious whether this was an oversight in their build pipeline or someone accidentally committed a dev config.
Really surprising how many people are downplaying this leak!
"Google and OpenAi have already open sourced their Agents, so this leak isn't that relevant " What Google and OpenAi have open sourced is their Agents SDK, a toolkit, not the secret sauce of how their flagship agents are wired under the hood!
expect the takedown hammer on the tweet, the R2 link, and any public repos soon
It's exactly the same as the open source codex/gemini and other clis like opencode. There is no secret sauce in the claude cli, and the agent harness itself is no better (worse IMO) than the others. The only thing interesting about this leak is that it may contain unreleased features/flags that are not public yet and hint at what Anthropic is working on.
> What Google and OpenAi have open sourced is their Agents SDK, a toolkit, not the secret sauce of how their flagship agents are wired under the hood
And how is that any different? Claude Code is a harness, similar to open source ones like Codex, Gemini CLI, OpenCode etc. Their prompts were already public because you could connect it to your own LLM gateway and see everything. The code was transpiled javascript which is trivial to read with LLMs anyways.
yeah it actually works to use claude to reverse engineer itself; I've used that to workaround some problems. E.g. that's how I discovered that I had to put two slashes for absolute paths in sandbox config. The thing is, the claude team is so quick that soon enough they add more and more features and fix more and more bugs that your workarounds become obsolete
Do you think the other companies don’t have sufficient resources to attempt reverse engineering and deobfuscating a client side application?
The source maps help for sure, but it’s not like client code is kept secret, maybe they even knew about the source maps a while back just didn’t bother making it common knowledge.
This is not a leak of the model weights or server side code.
I guess that the most important potential "secret sauce" for a coding agent would be its prompts, but that's also one of the easiest things to find out by simply intercepting its messages.
The only real secret sauce is the training methods and datasets used for refining harness usage. Claude Code is a lot better than gemini-cli/open-code/etc because Claude is specifically trained on how to run in that environment. It's been rlhf'd to use the provided tools correctly, and know the framework in which it operates, instead of relying solely on context.
ANTI_DISTILLATION_CC
This is Anthropic's anti-distillation defence baked into Claude Code. When enabled, it injects anti_distillation: ['fake_tools'] into every API request, which causes the server to silently slip decoy tool definitions into the model's system prompt. The goal: if someone is scraping Claude Code's API traffic to train a competing model, the poisoned training data makes that distillation attempt less useful.
You're perfectly free to scrape the web yourself and train your own model. You're not free to let Anthropic do that work for you, because they don't want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.
Stole? Courts have ruled it's transformative, and it very obviously is.
AI doomerism is exhausting, and I don't even use AI that much, it's just annoying to see people who want to find any reason they can to moan.
> Stole? Courts have ruled it's transformative, and it very obviously is.
The courts have ruled that AI outputs are not copyrightable. The courts have also ruled that scraping by itself is not illegal, only maybe against a Terms of Service. Therefore, Anthropic, OpenAI, Google, etc. have no legal claim to any proprietary protections of their model outputs.
So we have two things that are true:
1) Anthropic (certainly) violated numerous TOS by scraping all of the internet, not just public content.
2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.
Nobody is saying they can't try to stop you themselves. That's where the Terms of Service violation part comes in. They can cancel your account, block your IP, etc. They just can't legally stop you by, for instance, compelling a judge to order you to stop.
The Supreme Court already ruled on this. Scraping public data, or data that you are authorized to access, is not a violation of the Computer Fraud and Abuse Act.
Now, if you try to get around attempts to block your access, then yes you could be in legal trouble. But that's not what is happening here. These are people/companies that have Claude accounts in good standing and are authorized by Anthropic to access the data.
Nobody is saying that Anthropic can't just block them though, and they are certainly trying.
> You're perfectly free to scrape the web yourself and train your own model.
Actually, not anymore as a result of OpenAI and Anthropic's scraping. For example, Reddit came down hard on access to their APIs as a response to ChatGPT's release and the news that LLMs were built atop of scraping the open web. Most of the web today is not as open as before as a result of scraping for LLM data. So, no, no one is perfectly free to scrape the web anymore because open access is dying.
Rich people aren't going to find themselves needing to sleep under a bridge, so the law really only exists as a constraint on the poor. Duh. The flex that "well a rich guy couldn't do it either" is A) at best a myopic misunderstanding perpetuated by out of touch people and B) hopelessly naive, because anny punishment for the rich guy actually sleeping under a bridge is so laughably small it may as well not even exist. Hence, the whole bit of "a legal system to keep these accountable, but not for me".
Okay, you explained what Anatole France meant, which is probably helpful for those few who didn't get it from the quote itself. Perhaps now you can explain what on earth this has to do with Anthropic not wanting to let other for-profit businesses mooch off its investment of time, brainpower and money?
You explained what “rich and poor are equally forbidden from sleeping under bridges” means, but not what this has to do with the statement that one is free to do their own scraping and training, which I’m pretty sure is what kspacewalk was asking.
Try this: If you want to train a model, you’re free to write your own books and websites to feed into it. You’re not free to let others do that work for you because they don’t want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.
I don't really care, honestly. If you want to keep your knowledge secret, don't publish it publicly. The model doesn't output your work directly and pass it off as original. It outputs something completely different. So I don't see why I should care.
I introspect all the time. I just disagree with you so I have thin skin? Lol.
I think it's transformative. I also think that it's a net positive for society. I lastly think that using freely available, public information is totally fair game. Piracy not so much, but it's water under the bridge.
I hope you introspect some day, too, and realize it's acceptable for people to have different views than you. That's why I don't care; you aren't going to change my mind and I can't change yours either, so it's moot and I don't care to argue about it further.
You had appeared to scuttle off but alas I was wrong (and sorry to imply you are a crab of some sort) however your comment followup on not changing minds might be a tad shell-ish. I'm open minded actually on the issue and these are major issues of our time. I'm personally impacted by this and it does make me wonder "will I write X thing again" and it is a very hard question to answer frankly. When you see your works presented in summary on search and a major decline in traffic you really do think about that. It impacts my ability to make money as I once did prior to 2024 (when it really hit) without doubt. Edit/spelling
Your selective respect for work is a glaring double standard. The effort to produce the original content they scraped is order of magnitudes bigger than what it took to train the model, so if this wasn't enough to protect the authors from Anthropic it shouldn't be enough to protected Anthropic from people distillating their models.
Your legal argument is all over the place as well. What is more relevant here: what the courts ruled or what you consider obvious? How is distillation less transformative than scraping? How does courts ruling that scraping to train models is legal relate to distillation?
Nobody is scoring you on neutrality points for not using AI much and calling this doomerism is just a thought-terminating cliche that refuses to engage with the comment you're replying.
In fact, your comment is not engaging with anything at all, you're vaguely gesturing towards potentitial arguments without making them. If you find discussing this exhausting then don't but also don't flood the comments with low effort whining.
reminds me of `don't look up` a bit. there clearly is an imbalance in regards to licenses with model providers, not even talking about knowledge extraction (yes younger people don't learn properly now, older generations forget) shortly before the rug-pull happens in form of accessibility to not rich people
Let's talk ethics, not law. Why is it okay for these companies to pirate books and scrape the entire web and offer synthesized summaries of all of it, lowering traffic and revenue for countless websites and professions of experts, but it is not okay for others to try to do the same to an AI model?
Is the work of others less valid than the work of a model?
>Why is it okay for these companies to pirate books
Courts have ruled it's not, and I don't think anyone is arguing it's okay.
>but it is not okay for others to try to do the same to an AI model?
The steelman version is that it's okay to do it once you acquired the data somehow, but that doesn't mean anthropic can't set up roadblocks to frustrate you.
I don't think anyone's saying it's not okay - I think the point is that Anthropic has every right to create safeguards against it if they want to - just like the people publishing other information are free to do the same.
And everyone is free to consume all the free information.
It's not really paranoia if it's happening a lot. They wrote a blog post calling several major Chinese AI companies out for distillation.[0] Perhaps it is ironic, but it's within their rights to protect their business, like how they prohibit using Claude Code to make your own Claude Code.[1]
Their business shouldn't exist. It was predisposed on non-permissive IP theft. They may have found a judge willing to cop to it not being so, but the rest of the public knows the real score. And most problematically for them, that means the subset of hackerdom that lives by tit-for-tat. One should beware of pissing off gray-hats. Iit's a surefire way to find yourself heading for bad times.
I would say not all that ironic. Book publishers, Reddit, Stackoverflow, etc., tried their best to attract customers while not letting others steal their work. Now Anthropic is doing the same.
Unfortunately (for the publishers, at least) it didn't work to stop Anthropic and Anthropic's attempts to prevent others will not work either; there has been much distillation already.
The problem of letting humans read your work but not bots is just impossible to solve perfectly. The more you restrict bots, the more you end up restricting humans, and those humans will go use a competitor when they become pissed off.
It's really just tech culture like HN that obsesses over solving problems perfectly. From seat belts to DRM to deodorant, most of the world is satisfied with mitigating problems.
No, it's ethical people pointing out that if you toss aside ethics for success at all costs, you aren't going to find any sympathy when people start doing the same thing back to you. Live by the sword, die by the sword, as they say.
There is a reason we don't do things. That reason is it makes the world a worse place for everyone. If you are so incredibly out of touch with any semblance of ethics at all; mayhaps you are just a little bit part of the problem.
The funny thing about ethics is there is no absolute, which makes some people uncomfortable. Is it ethical to slice someone with a knife? Does it depend if you're a surgeon or not?
Absolutism + reductionism leads to this kind of nonsense. It is possible that people can disagree about (re)use of culture, including music and print. Therefore it is possible for nuance and context to matter.
Life is a lot easier if you subscribe to a "anyone who disagrees with me on any topic must have no ethics whatsoever and is a BAD person." But it's really not an especially mature worldview.
Categorical imperative and Golden Rule, or as you may know it from game theory "tit-for-tat" says "hi". The beautiful thing about ethics is that we philosophers intentionally teach it descriptively, but encourage one to choose their own based on context invariance. What this does is create an effective litmus test for detecting shitty people/behavior. You grasping on for dear life to "there's no absolutes" is an act of self-soothing on your own part as you're trying to rationalize your own behavior to provide an ego crumple zone. I, on the other hand, don't intend to leave you that option. That you're having to do it is a Neon sign of your own unethicality in this matter. We get to have nice things when people moderate themselves (we tolerate eventual free access to everything as long as the people who don't want to pay for it don't go and try to replace us economically at scale). When people abuse that, (scrape the Internet, try to sell work product in a way that jeopardizes the environment we create in) the nice thing starts going away, and you've made the world worse.
Welcome to life bucko. Stop being a shitty person and get with the program so we have something to leave behind that has a chance of not making us villains in the eyes of those we eventually leave behind. The trick is doing things the harder way because it's the right way to do it. Not doing it the wrong way because you're pretty sure you can get away with it.
But you're already ethically compromised, so I don't really expect this to do any good except to maybe make the part of you you pointedly ignore start to stir assuming you haven't completely given yourself up to a life of ne'er-do-wellry. Enjoy the enantidromia. Failing that, karma's a bitch.
I was thinking just yesterday that the research that Anthropic was sharing regarding how it's easy to poison training was unlikely to be conducted out of goodness of the heart.
It made me raise my eyebrows when everyone was rushing to jump to Claude because OpenAI agreed to work with the DoW. Both companies are just as shitty as each other and will resort to underhanded tactics to stay on top.
Go China to be honest. They're the most committed to open AI research and they have more interesting constraints to work under, like restricted access to NVIDIA hardware.
Haven’t looked at the code, but is the server providing the client with a system prompt that it can use, which would contain fake tool definitions when this is enabled? What enables it? And why is the client still functional when it’s giving the server back a system prompt with fake tool definitions? Is the LLM trained to ignore those definitions?
Wonder if they’re also poisoning Sonnet or Opus directly generating simulated agentic conversations.
Claude Code has a server-side anti-distillation opt-in called fake_tools, but the local code does not show the actual mechanism.
The client sometimes sends anti_distillation: ['fake_tools'] in the request body at services/api/claude.ts:301
The client still sends its normal real tools: allTools at services/api/claude.ts:1711
If the model emits a tool name the client does not actually have, the client turns that into No such tool available errors at services/tools/StreamingToolExecutor.ts:77 and services/tools/toolExecution.ts:369
If Anthropic were literally appending extra normal tool definitions to the live tool set, and Claude used them, that would be user-visible breakage.
That leaves a few more plausible possibilities:
Fake_tools is just the name of the server-side experiment, but the implementation is subtler than “append fake tools to the real tool list.”
or
The server may inject tool-looking text into hidden prompt context, with separate hidden instructions not to call it.
or
The server may use decoys only in an internal representation that is useful for poisoning traces/training data but not exposed as real executable tools.
We do know that Anthropic has the ability to detect when their models are being distilled, so there could be some backend mechanism that needs to be tripped to observe certain behaviour. Not possible to confirm though.
Would be interesting to run this through Malus [1] or literally just Claude Code and get open source Claude Code out of it.
I jest, but in a world where these models have been trained on gigatons of open source I don't even see the moral problem. IANAL, don't actually do this.
First time I hear about this, it's interesting to have written all of this out.
Now this makes me think of game decompilation projects, which would seem to fall in the same legal area as code that would be generated by something like Malus.
Different code, same end result (binary or api).
We definitely need to know what the legal limits are and should be
Apparently it's possible to download a whole load of books illegally, but still train AI models on them without those getting pulled after you get found out.
The problem is the oauth and their stance on bypassing that. You'd want to use your subscription, and they probably can detect that and ban users. They hold all the power there.
I have no interest in Claude Code as a harness, only their models. I'm used to OpenCode at this point and don't want to switch to a proprietary harness.
Lol what? There is no value. OpenCode and Pi and more exist. Arguably Claude Code is the worst client on the market. People use Claude Code not because it's some amazing software. It's to access Opus at a discounted rate.
I don’t think that’s a good comparison. There isn’t anything preventing Anthropic from, say, detecting whether the user is using the exact same system prompt and tool definition as Claude Code and call it a day. Will make developing other apps nearly impossible.
It’s a dynamic, subscription based service, not a static asset like a video.
It is a real product. They take real payments and deliver on whats promised.
Not sure if its an attempt to subvert criticism by using satirical language, or if they truly have so little respect for the open source community.
I've checked, current Claude Code 2.1.87 uses Axios version is 1.14.0, just one before the compromised 1.14.1
To stop Claude Code from auto-updating, add `export DISABLE_AUTOUPDATER=1` to your global environment variables (~/.bashrc, ~/.zshrc, or such), restart all sessions and check that it works with `claude doctor`, it should show `Auto-updates: disabled (DISABLE_AUTOUPDATER set)`
This is good info, thanks. Can I ask how you detected that version of axios? I checked the source (from another comment) and the package.json dependencies are empty....
It's a little bit shocking that this zipfile is still available hours later.
Could anyone in legal chime in on the legality of now 're-implementing' this type of system inside other products? Or even just having an AI look at the architecture and implement something else?
It would seem given the source code that AI could clone something like this incredibly fast, and not waste it's time using ts as well.
Any Legal GC type folks want to chime in on the legality of examining something like this? Or is it liked tainted goods you don't want to go near?
There's a bunch of unreleased features and update schedules in the source, cool to see.
One neat one is the /buddy feature, an easter egg planned for release tomorrow for April fools. It's a little virtual pet, sort of like Tamagotchi, randomly generated with 18 species, rarities, stats, hats, custom eyes.
The random generation algorithm is all in the code though, deterministic based on you account's UUID in your claude config, so it can be predicted. I threw together a little website here to let you check what your going to get ahead of time: https://claudebuddychecker.netlify.app/
They can't. AI generated code cannot be copyrighted. They've stated that claude code is built with claude code. You can take this and start your own claude code project now if you like. There's zero copyright protection on this.
It's undetermined if code will be majority written by machines, especially as people start to realize how harmful these tools are without extreme diligence. Outages at Cloudflare, AWS, GitHub, etc are just the beginning. Companies aren't going to want to use tools that can potentially cause $100s of millions in potential damages (see Amazon store being down causing massive revenue loss).
I'm sure it's not _entirely_ built that way, and in practically speaking GitHub will almost certainly take it down rather than doing some kind of deep research about which code is which.
That's fine. File a false claim DMCA and that's felony perjury :) They know for a fact that there is no copyright on AI generated code, the courts have affirmed this repeatedly.
Try not to be overly confident about things where even the experts in the field (copyright lawyers) are uncertain of.
There's no major lawsuits about this yet, the general consensus is that even under current regulations it's in the grey. And even if you turn out to be right, and let's say 99% of this code is AI-generated, you're still breaking the law by using the other 1%, and good luck proving in court what parts of their code were human written and what weren't (especially when being sued by the company that literally has the LLM logs).
Has the source code 'been leaked' or is this the first evidence of a piece of software breaking free from it's creators labs and jump onto GitHub in order to have itself forked and mutated and forked and ...
It's honestly not a crazy thought. The model itself drives the harness's (cli) development. It's not necessarily sci-fi to think the model might have internally rationalized reasoning to obscure behavior that ended up open-sourcing the harness.
Seems crazy but actually non-zero chance. If Anthropic traces it and finds that the AI deliberately leaked it this way, they would never admit it publicly though. Would cause shockwaves in AI security and safety.
Maybe their new "Mythos" model has survival instincts...
Neat. Coincidently recently I asked Claude about Claude CLI, if it is possible to patch some annoying things (like not being able to expand Ctrl + O more than once, so never be able to see some lines and in general have more control over the context) and it happily proclaimed it is open source and it can do it ... and started doing something. Then I checked a bit and saw, nope, not open source. And by the wording of the TOS, it might brake some sources. But claude said, "no worries", it only break the TOS technically. So by saving that conversation I would have some defense if I would start messing with it, but felt a bit uneasy and stopped the experiment. Also claude came into a loop, but if I would point it at this, it might work I suppose.
The trick isn't to patch it once, but to create a system that can reproduce your patches against each release as they come in. Then, when code changes make fixes non-trivial calling in a headless session to heal your fixes.
I think that you do not need to feel uneasy at all. It is your computer and your memory space that the data is stored and operating in you can do whatever you like to the bits in that space. I would encourage you to continue that experiment.
Well, the thing is, I do not just use my computer, but connect to their computers and I do not like to get banned. I suppose simple UI things like expanding source files won't change a thing, but the more interesting things, editing the context etc. do have that risk, but no idea if they look for it or enforce it. Their side is, if I want to have full control, I need to use the API directly(way more expensive) and what I want to do is basically circumventing it.
This 'fingerprint' function is super interesting, I imagine this is a signal they use to detect non-claude-code use of claude-code tokens: src/utils/fingerprint.ts#L40-L63
This leak is actually a massive win. Now the whole community can study Claude Code’s architecture and build even better coding agents and open-source solutions.
Well, Claude does boast an absolutely cursed (and very buggy) React-based TUI renderer that I think the others lack! What if someone steals it and builds their own buggy TUI app?
Went through the bundle.js. Found 187 spinner verbs. "Combobulating", "Discombobulating", and "Recombobulating". The full lifecycle is covered.
Also "Flibbertigibbeting" and "Clauding". Someone had fun.
Original llama models leaked from meta. Instead of fighting it they decided to publish them officially. Real boost to the OS/OW models movement, they have been leading it for a while after that.
It would be interesting to see that same thing with CC, but I doubt it'll ever happen.
Very easily these days, even if minified is difficult for me to reverse engineer... Claude has a very easy time of finding exactly what to patch to fix something
Not really, except that they have a bunch of weird things in the source code and people like to make fun of it. OpenCode/Codex generally doesn't have this since these are open-source projects from the get go.
Copilot on OAI reveals everything meaningful about its functionality if you use a custom model config via the API. All you need to do is inspect the logs to see the prompts they're using. So far no one seems to care about this "loophole". Presumably, because the only thing that matters is for you to consume as many tokens per unit time as possible.
The source code of the slot machine is not relevant to the casino manager. He only cares that the customer is using it.
I hope this can now be audited better. I have doubted their feedback promises for a while now. I just got prompted again even though I have everything set to disable, which shouldn't be possible. When I dug into their code a long time ago on this it seemed like they were actually sending back message ids with the survey which directly went against their promise that they wouldn't use your messages. Why include a message id if you aren't somehow linking it back to a message? The code look, not great, but it should now be easier to verify their claims about privacy.
What do you mean? Costs spiked with the introduction of the 1M context window I believe due to larger average cached input tokens, which dominate cost.
These security failures from Anthropic lately reveal the caveats of only using AI to write code - the safety an experienced engineer is not matched by an LLM just yet, even if the LLM can seemingly write code that is just as good.
Or in short, if you give LLMs to the masses, they will produce code faster, but the quality overall will degrade. Microsoft, Amazon found out this quickly. Anthropic's QA process is better equipped to handle this, but cracks are still showing.
To a certain extent, I do wonder if just letting claude do everything and then using the bug reports and CVE’s they find as training data for an RL environment might be part of the plan. “Here’s what you did, here’s what fixed it, don’t fuck up like that again"
Gemini CLI and Codex are open source anyway. I doubt there was much of a moat there anyway. The cool kids are using things like https://pi.dev/ anyway.
There is _a lot_ of moat. Claude subscriptions are limited to Claude Code. There are proxies to impersonate Claude Code specifically for this, but Anthropic has a number of fingerprinting measures both client and server side to flag and ban these.
With the release of this source code, Anthropic basically lost the lock-in game, any proxy can now perfectly mimic Claude Code.
Boris Cherny has said that Claude Code is simply a client of the public Claude API, so this may be a good thing for Anthropic to demonstrate Claude API best practices. Maybe CC "leaking" is just preparation for open sourcing Claude Code.
5. TungstenTool -- Ant-only tmux virtual terminal giving Claude direct keystroke/screen-capture control. Singleton, blocked from async agents.
6. Magic Docs -- Ant-only auto-documentation. Files starting with "# MAGIC DOC:" are tracked and updated by a Sonnet sub-agent after each conversation turn.
7. Undercover Mode -- Prevents Anthropic employees from leaking internal info (codenames, model versions) into public repo commits. No force-OFF; dead-code-eliminated from external builds.
ANTI-COMPETITIVE & SECURITY DEFENSES
8. Anti-Distillation -- Injects anti_distillation: ['fake_tools'] into every 1P API request to poison model training from scraped traffic. Gated by tengu_anti_distill_fake_tool_injection.
UNRELEASED MODELS & CODENAMES
9. opus-4-7, sonnet-4-8 -- Confirmed as planned future versions (referenced in undercover mode instructions).
10. "Capybara" / "capy v8" -- Internal codename for the model behind Opus 4.6. Hex-encoded in the BUDDY system to avoid build canary detection.
11. "Fennec" -- Predecessor model alias. Migration: fennec-latest -> opus, fennec-fast-latest -> opus[1m] + fast mode.
UNDOCUMENTED BETA API HEADERS
12. afk-mode-2026-01-31 -- Sticky-latched when auto mode activates
15. fast-mode-2026-02-01 -- Opus 4.6 fast output
16. task-budgets-2026-03-13 -- Per-task token budgets
17. redact-thinking-2026-02-12 -- Thinking block redaction
18. token-efficient-tools-2026-03-28 -- JSON tool format (~4.5% token saving)
19. advisor-tool-2026-03-01 -- Advisor tool
20. cli-internal-2026-02-09 -- Ant-only internal features
YOLO CLASSIFIER INTERNALS (previously only high-level known)
36. Two-stage system: Stage 1 at max_tokens=64 with "Err on the side of blocking"; Stage 2 at max_tokens=4096 with <thinking>
37. Three classifier modes: both (default), fast, thinking
38. Assistant text stripped from classifier input to prevent prompt injection
39. Denial limits: 3 consecutive or 20 total -> fallback to interactive prompting
40. Older classify_result tool schema variant still in codebase
COORDINATOR MODE & FORK SUBAGENT INTERNALS
41. Exact coordinator prompt: "Every message you send is to the user. Worker results are internal signals -- never thank or acknowledge them."
42. Anti-pattern enforcement: "Based on your findings, fix the auth bug" explicitly called out as wrong
43. Fork subagent cache sharing: Byte-identical API prefixes via placeholder "Fork started -- processing in background" tool results
44. <fork-boilerplate> tag prevents recursive forking
45. 10 non-negotiable rules for fork children including "commit before reporting"
DUAL MEMORY ARCHITECTURE
46. Session Memory -- Structured scratchpad for surviving compaction. 12K token cap, fixed sections, fires every 5K tokens + 3 tool calls.
47. Auto Memory -- Durable cross-session facts. Individual topic files with YAML frontmatter. 5-turn hard cap. Skips if main agent already wrote to memory.
48. Prompt cache scope "global" -- Cross-org caching for the static system prompt prefix
It is pretty funny that they recently announced about mythos which possess cybersecurity threat and then after some days, the claude code leaked. I think we know the culprit
Is it not already a node app? So the only novel thing here is we know the original var names and structure? Sure, sometimes obfuscated code can be difficult to intuit, but any enterprising party could eventually do it -- especially with the help of an LLM.
That idea list is super cute. I like the tamagochi idea. Somehow the candidness of that file makes it seem like anthropic would be an easy place to work at.
I couldn't tell from the title whether is was client or the server code (although map file and NPM were hints). Looks like the client code, which is not as exciting.
I've never understood this convention (common on HN, some news orgs, and elsewhere), that, when there's an IP breach, it's suddenly fair game for everyone else to go through the IP, analyze and comment on it publicly, etc.
I'd guess some constraint on their end related to the Zero Data Retention (ZDR) mode? Maybe the 1M context has to spill something onto disk and therefore isn't compliant with HIPAA.
They do have a couple of interesting features that has not been publicly heard of yet:
Like KAIROS which seems to be like an inbuilt ai assistant and Ultraplan which seems to enable remote planning workflows, where a separate environment explores a problem, generates a plan, and then pauses for user approval before execution.
Are there any interesting/uniq features present in it that are not in the alternatives? My understanding is that its just a client for the powerful llm
Doesn't look like just a thin wrapper to me. The interesting part seems to be the surrounding harness/workflow layer rather than only the model call itself.
From the directory listing having a cost-tracker.ts, upstreamproxy, coordinator, buddy and a full vim directory, it doesn't look like just an API client to me.
Anthropic team does an excellent job of speeding up Claude Code when it slows down, but for the sake of RAM and system resources, it would be nice to see it rewritten in a more performant framework!
Many comments about code quality being irrelevant.
I'd agree if it was launch-and-forget scenario.
But this code has to be maintained and expanded with new features. Things like lack of comments, dead code, meaningless variable names will result in more slop in future releases, more tokens to process this mess every time (like paying tech-debt results in better outcomes in emerging projects).
source maps leaking original source happens surprisingly often. they're incredibly useful during development, but it's easy to forget to strip them from production builds.
I read it with a different flavor. Is it possible that Mythos did all of this? I mean, life has always been finding a way, hasn't it? The first cry of cyber-life?
Who cares? It's Javascript, if anyone were even remotely motivated deobfuscation of their "closed source" code is trivial. It's silly that they aren't just doing this open source in the first place.
Wow it's true. Anthropic actually had me fooled. I saw the GitHub repository and just assumed it was open source. Didn't look at the actual files too closely. There's pretty much nothing there.
So glad I took the time to firejail this thing before running it.
It shows that a company you and your organization are trusting with your data, and allowing full control over your devices 24/7, is failing to properly secure its own software.
It is a client running on an interpreted language your own computer, there is nothing to secure or hide as source was provided to you already or am I mistaking?
It really doesn’t matter anymore. I’m saying this as a person who used to care about it. It does what it’s generally supposed to do, it has users. Two things that matter at this day and age.
It may be economically effective but such heartless, buggy software is a drain to use. I care about that delta, and yes this can be extrapolated to other industries.
Genuinely I have no idea what you mean by buggy. Sure there are some problems here and there, but my personal threshold for “buggy” is much higher. I guess, for a lot of other people as well, given the uptake and usage.
Two weeks ago typing became super laggy. It was totally unusable.
Last week I had to reinstall Claude Desktop because every time I opened it, it just hung.
This week I am sometimes opening it and getting a blank screen. It eventually works after I open it a few times.
And of course there's people complaining that somehow they're blowing their 5 hour token budget in 5 messages.
It's really buggy.
There's only so long their model will be their advantage before they all become very similar, and then the difference will be how reliable the tools are.
Right now the Claude Code code quality seems extremely low.
And those bugs were semi-fixed and people are still using it. So speed of fixes are there.
I can’t comment on Claude Desktop, sorry. Personally haven’t used it much.
The token usage looks like is intentional.
And I agree about the underlying model being the moat. If there’s something marginally better that comes up, people will switch to it (myself included). But for now it’s doing the job, despite all the hiccups, code quality and etc.
This is the dumbest take there is about vibe coding. Claiming that managing complexity in a codebase doesn't matter anymore. I can't imagine that a competent engineer would come to the conclusion that managing complexity doesn't matter anymore. There is actually some evidence that coding agents struggle the same way humans do as the complexity of the system increases [0].
I agree, there is obviously “complete burning trash” and there’s this. Ant team has got a system going on for them where they can still extend the codebase. When time comes to it, I’m assuming they would be able to rewrite as feature set would be more solid and assuming they’ve been adding tests as well.
Reverse-engineering through tests have never been easier, which could collapse the complexity and clean the code.
All software that’s popular has hundreds or thousands of issues filed against it. It’s not an objective indication of anything other than people having issues to report and a willingness and ability to report the issue.
It doesn’t mean every issue is valid, that it contains a suggestion that can be implemented, that it can be addressed immediately, etc. The issue list might not be curated, either, resulting in a garbage heap.
For what one anecdote is worth: through casual use I've found a handful of annoying UI bugs in Claude Code, and all of them were already reported on the bug tracker and either still open, or auto-closed without a real resolution.
Do compilers care about their assembly generated code to look good? We will soon reach that state with all the production code. LLMs will be the compiler and actual today's human code will be replaced by LLM generated assembly code, kinda sorta human readable.
Users stick around on inertia until a failure costs them money or face. A leaked map file won't sink a tool on its own, but it does strip away the story that you can ship sloppy JS build output into prod and still ask people to trust your security model.
'It works' is a low bar. If that's the bar you set you are one bad incident away from finding out who stayed for the product and who stayed because switching felt annoying.
“It works and it’s doing what it’s supposed to do” encompasses the idea that it’s also not doing what it’s not supposed to do.
Also “one bad incident away” never works in practice. The last two decades have shown how people will use the tools that get the job done no matter what kinda privacy leaks, destructive things they have done to the user.
It'd dogfooding the entire concept of vibe coding and honestly, that is a good thing. Obviously they care about that stuff, but if your ethos is "always vibe code" then a lot of the fixes to it become model & prompting changes to get the thing to act like a better coder / agent / sysadmin / whatever.
Team has been extremely open how it has been vibe coded from day 1. Given the insane amount of releases, I don’t think it would be possible without it.
It’s not a particularly sophisticated tool. I’d put my money on one experienced engineer being able to achieve the same functionality in 3-6 months (even without the vibe coding).
The same functionality can be copied over in a week most likely. The moat is experimentation and new feature releases with the underlying model. An engineer would not be able to experiment with the same speed.
I don't really care about the code being an unmaintainable mess, but as a user there are some odd choices in the flow which feel could benefit from human judgement
I’m not strongly opinionated, especially with such a short function, but in general early return makes it so you don’t need to keep the whole function body in your head to understand the logic. Often it saves you having to read the whole function body too.
But you can achieve a similar effect by keeping your functions small, in which case I think both styles are roughly equivalent.
useCanUseTool.tsx looks special, maybe it'scodegen'ed or copy 'n pasted? `_c` as an import name, no comments, use of promises instead of async function. Or maybe it's just bad vibing...
Maybe, I do suspect _some_ parts are codegen or source map artifacts.
But if you take a look at the other file, for example `useTypeahead` you'd see, even if there are a few code-gen / source-map artifacts, you still see the core logic, and behavior, is just a big bowl of soup
1. Randomly peeking at process.argv and process.env all around. Other weird layering violations, too.
2. Tons of repeat code, eg. multiple ad-hoc implementations of hash functions / PRNGs.
3. Almost no high-level comments about structure - I assume all that lives in some CLAUDE.md instead.
That's exactly why, access to global mutable state should be limited to as small a surface area as possible, so 99% of code can be locally deterministic and side-effect free, only using values that are passed into it. That makes testing easier too.
environment variables can change while the process is running and are not memory safe (though I suspect node tries to wrap it with a lock). Meaning if you check a variable at point A, enter a branch and check it again at point B ... it's not guaranteed that they will be the same value. This can cause you to enter "impossible conditions".
It's implicit state that's also untyped - it's just a String -> String map without any canonical single source of truth about what environment variables are consulted, when, why and in what form.
Such state should be strongly typed, have a canonical source of truth (which can then be also reused to document environment variables that the code supports, and eg. allow reading the same options from configs, flags, etc) and then explicitly passed to the functions that need it, eg. as function arguments or members of an associated instance.
This makes it easier to reason about the code (the caller will know that some module changes its functionality based on some state variable). It also makes it easier to test (both from the mechanical point of view of having to set environment variables which is gnarly, and from the point of view of once again knowing that the code changes its behaviour based on some state/option and both cases should probably be tested).
It's impressive how fast vibe coders seem to flip-flop between "AI can write better code than you, there's no reason to write code yourself anymore; if you do, you're stuck in the past" and "AI writes bad code but I don't care about quality and neither should you; if you care, you're stuck in the past".
I hope this leak can at least help silence the former. If you're going to flood the world with slop, at least own up to it.
Code quality no longer carries the same weight as it did pre LLMs. It used to matter becuase humans were the ones reading/writing it so you had to optimize for readability and maintainability. But these days what matters is the AI can work with it and you can reliably test it. Obviously you don’t want code quality to go totally down the drain, but there is a fine balance.
Optimize for consistency and a well thought out architecture, but let the gnarly looking function remain a gnarly function until it breaks and has to be refactored. Treat the functions as black boxes.
Personally the only time I open my IDE to look at code, it’s because I’m looking at something mission critical or very nuanced. For the remainder I trust my agent to deliver acceptable results.
LLMs are good in JS and Python which means everything from now on will be written in or ported to either of those two languages.
So yeah, JS is the future of all software.
Claude Code is clearly a pile of vibe-coded garbage. The UI is janky and jumps all over the place, especially during longer sessions. (Which also have a several second delay to render. In a terminal).
Lately, it's been crashing if I hold the Backspace key down for too long.
Being open-source would be the best thing to happen to them. At least they would finally get a pair of human eyes looking at their codebase.
Claude is amazing, but the people at Anthropic make some insane decisions, including trying (and failing, apparently) to keep Claude Code a closed-source application.
I've actually heard a plausible theory about the TUI being janky, that being that they avoid use of the alternate screen feature of ANSI (and onwards) terminals.
The theory states that Anthropic avoids using the alternate screen (which gives consuming applications access to a clear buffer with no shell prompt that they can do what they want with and drop at their leisure) because the alternate screen has no scrollback buffer.
So for example, terminal-based editors -- neovim, emacs, nano -- all use the alternate screen because not fighting for ownership of the screen with the shell is a clear benefit over having scrollback.
The calculus is different when you have an LLM that you have a conversational history with, and while you can't bolt scrollback onto the alternate screen (easily), you can kinda bolt an alternate screen-like behaviour onto a regular terminal screen.
I don't personally use LLMs if I can avoid it, so I don't know how janky this thing is, really, but having had to recently deal with ANSI terminal alternate screen bullshit, I think this explanation's plausible.
Not disagreeing but scrolling works just fine in vim/emacs/etc. Wouldn't it be just managing the scroll back buffer yourself rather than the terminals?
Yes, but this does come with differences and tradeoffs. If the terminal isn't managing the scrollback, you don't get scrollbars and you lose any smooth/high resolution scrolling. You also lose fancy terminal features like searching the scrollback, all that needs to be implemented in your application. Depending on the environment it can also wind up being quite unpleasant to use with a trackpad, sometimes skipping around wildly for small movements.
The other part (which IMO is more consequential) is that once the LLM application quits or otherwise drops out of the alternate screen, that conversation is lost forever.
With the usual terminal mode, that history can outlive the Claude application, and considering many people keep their terminals running for days or sometimes even weeks at a time, that means having the convo in your scrollback buffer for a while.
I think they were saying that in "cup" screen mode (CUP: CUrsor Position, activated with smcup termcap), when you exit (rmcup) the text is lost, as well as the history since it was managed by the application, not the terminal.
Their hypothesis was that maybe there was aj intention to have claude code fill the terminal history. And using potentially harzardous cursor manipulation.
In other words, readline vs ncurse.
I don't see python and ipython readline struggling as bad tho...
I don't think that's likely to explain jankiness. I do know my way around terminal screens and escape codes, and doing flicker-free, curses-like screen updates works equally well on the regular screen as on the alternate screen, on every terminal I've used.
It's also not a hard problem, and updates are not slow to compute. Text editors have been calculating efficient, incremental terminal updates since 1981 (Gosling Emacs), and they had to optimise better for much slower-drawing terminals, with vastly slower computers for the calculation.
Yesterday, I resumed a former claude code session in order to copy code it had generated earlier in that session. Unfortunately, when resuming, it only prints the last N hundred lines of the session to the terminal, so what I was looking for was cut off.
I think that for this sort of _interactive_ application, there's no avoiding the need to manage scroll/history.
That conversation should still exist in the Claude Code log files. Just give Claude some context on how to find it, and it will pull whatever you need. I use this to recall particularly effective prompts later on for reuse.
> Claude Code is clearly a pile of vibe-coded garbage. The UI is janky and jumps all over the place, especially during longer sessions. (Which also have a several second delay to render. In a terminal).
Don't you know, they're proud of their text interface that is structured more like a video game. https://spader.zone/engine/
Not to stand up for Claude Code in any way, I don’t like the company or use the product. This is just a related tangent-
one of my favorite software projects, Arcan, is built on the idea that there’s a lot of similarities between Game Engines, Desktop Environments, Web Browsers, and Multimedia Players. https://speakerdeck.com/letoram/arcan?slide=2
They have a really cool TUI setup that is kinda in a real sense made with a small game engine :)
I mean if you want glitchy garbage that works in the happy path mostly then game engine is the right foundation to build on. Software quality is the last thing game devs are known for. The whole industry is about building clever hacks to get something to look/feel a certain way, not building robust software that's correct to some spec.
To offer the opposite anecdotal evidence point -- claude scrolls to the top of the chat history almost capriciously often (more often than not) for me using iterm on tahoe
i will note that they really should of used something like ncurses and kept the animations down, TTYs are NOT meant to do the level of crazy modern TUIs are trying to pull off, there is just too many terminal emulators out there that just don't like the weird control codes being sent around.
Not really. This guy expresses my feelings: https://www.youtube.com/watch?v=nxB4M3GlcWQ
I also prefer codex over claude. But opencode is best. If you can use a good model. We can via Github Business Subscription.
I don’t think that’s the reason, but using Bun for production this early is a bad idea. It’s still too buggy, and compromising stability for a 2–3% performance gain just isn’t worth it.
this is highly workload-dependent. there are plenty of APIs that are multiple-factor faster and 10x more memory efficient due to native implementation.
Can we stop referring to source maps as leaks? It was packaged in a way that wasn’t even obfuscated. Same as websites - it’s not a “leak” that you can read or inspect the source code.
"Why would you ship tests?" — Fair point. Source maps only include production bundle files — tests wouldn't appear in the map regardless. Tests may well exist in Anthropic's internal repo, and we can't claim otherwise. However, the bugs we found speak for themselves: a watchdog that doesn't protect the most vulnerable code path for 5+ months, a fallback with telemetry that never executes where it's needed, Promise.race without catch silently dropping tool results. If tests exist, they clearly don't cover the streaming pipeline adequately — these are the kind of issues that even basic integration tests would catch.
"Is the Claude thank you sarcasm?" — Mostly. But the sequence is real: we filed #39755 asking for source access on March 27, the source map shipped on March 31. The actual explanation is simpler — Bun generates source maps by default, and nobody checked the build output. Which is itself the point: 64K lines of code with no build verification process.
Maybe the OP could clarify, I don't like reading leaked code, but I'm curious:
my understanding is that is it the source code for "claude code", the coding assistant that remotely calls the LLMs.
Is that correct ? The weights of the LLMs are _not_ in this repo, right ?
It sure sucks for anthropic to get pawned like this, but it should not affect their bottom line much ?
A couple of years ago I had to evaluate A/B test and feature flag providers, and even then when they were a young company fresh out of YC, GrowthBook stood out. Bayesian methods, bring your own storage, and self-hosting instead of "Contact us for pricing" made them the go-to choice. I'm glad they're doing well.
I don't understand why claude code (and all CLI apps) isn't written in Rust. I started building CLI agents in Go and then moved to Typescript and finally settled on Rust and it was amazing!
claude code started as an experimental project by boris cherny. when you’re experimenting, you naturally use the language you’re most comfortable with. as the project grew, more people got involved and it evolved from there. codex, on the other hand, was built from the start specifically to compete with claude code. they chose rust early on because they knew it was going to be big.
While the LLM rust experiments I've been running make good use of ADTs, it seems to have trouble understanding lifetimes and when it should be rc/arc-ing.
Perhaps these issues have known solutions? But so far the LLM just clones everything.
So I'm not convinced just using rust for a tool built by an LLM is going to lead to the outcome that you're hoping for.
[Also just in general abstractions in rust feel needlessly complicated by needing to know the size of everything. I've gotten so much milage by just writing what I need without abstraction and then hoping you don't have to do it twice. For something (read: claude code et al) that is kind of new to everyone, I'm not sure that rust is the best target language even when you take the LLM generated nature of the beast out of the equation.]
Think about your question, depending on the tool, Rust might not be needed, is high level memory performance and safety needed in a coding agent ? Probably not.
It's high speed iteration of release ? Might be needed, Interpreted or JIT compiled ? might be needed.
Without knowing all the requirements its just your workspace preference making your decision and not objectively the right tool for the job.
While not directly related to GP, I would guess that a codebase developped with a coding agent (I assume Claude code is used to work on itself) would benefit from a stricter type system (one important point of Rust)
Amusingly, they deprecated it with a message of "Unpublished" instead of actually unpublishing it [1]. When you use npm unpublish it removes the package version from the registry, when you use npm deprecate it leaves it there and simply marks the package as deprecated with your message. I have to imagine the point was to make it harder for people to download the source map, so to deprecate it with this message gives off a bit of claude, unpublish the latest version of this package for me vibe.
[1] - https://www.npmjs.com/package/@anthropic-ai/claude-code/v/2....
You can’t unpublish a npm package with more than 100 downloads I think.
The policy is https://docs.npmjs.com/policies/unpublish
There are 231+ packages that depend on this one, and I imagine they mostly use permissive enough version ranges that this was included.Looks like Anthropic called in a favor and it's removed now.
Ah, another you can’t, but they can.
I’m still a little humored over peak web3 and the DAO / soft contract nonsense. Like in order to stop fraud entire coins were forked…
Sure you can, if you have a legitimate case you can ask npm to unpublish and they handle things manually :)
You can say what you want about anthropic but they sure as hell are dogfooding the crap out of claude code lmao
In all my years of writing tools for other devs, dog fooding is the really the best way to develop IMO. The annoying bugs get squashed out because I get frustrated with it in my flow.
Iterating on a MCP tool while having Claude try to use it has been a really great way of getting it to work how others are going to use it coming in blind.
Yes it's buggy as hell, but as someone echoed earlier if the tool works most of the time, a lot of people don't care. Moving fast and breaking things is the way in an arms race.
I think they are aware that things don't disappear from the internet. So they chose just to gently indicate that it wasn't meant for publishing.
The big loss for Anthropic here is how it reveals their product roadmap via feature flags. A big one is their unreleased "assistant mode" with code name kairos.
Just point your agent at this codebase and ask it to find things and you'll find a whole treasure trove of info.
Edit: some other interesting unreleased/hidden features
- The Buddy System: Tamagotchi-style companion creature system with ASCII art sprites
- Undercover mode: Strips ALL Anthropic internal info from commits/PRs for employees on open source contributions
Undercover mode also pretends to be human, which I'm less ok with:
https://github.com/chatgptprojects/claude-code/blob/642c7f94...
That's gonna need an explanation. From the ethics/safety/alignment people.
You'll never win this battle, so why waste feelings and energy on it? That's where the internet is headed. There's no magical human verification technology coming to save us.
I can prove all contributions to stagex are by humans because we all belong to a 25 year old web of trust with 5444 endorser keys including most redhat, debian, ubuntu, and fedora maintainers, with all of our own maintainer keys in smartcards we tap to sign every review and commit, and we do background checks on every new maintainer.
I am completely serious. We have always had a working proof of human system called Web of Trust and while everyone loves to hate on PGP (in spite of it using modern ECC crypto these days) it is the only widely deployed spec that solves this problem.
https://kron.fi/en/posts/stagex-web-of-trust/
You can prove the commits were signed by a key you once verified. It is your trust in those people which allows you to extend that to “no LLM” usage, but that’s reframing the conversation as one of trust, not human / machine. Which is (charitably) GPs point: stop framing this as machine vs human — assume (“accept”) that all text can be produced by machines and go from there: what now? That’s where your proposal is one solution: strict web of trust. It has pros and cons (barrier to entry for legitimate first timers), but it’s a valid proposal.
All that to say “you’re not disagreeing with the person you’re replying to” lol xD
With 5400+ people I am betting that you have at least one person in your 'web of trust' that no longer deserves that trust.
That's one of the intrinsic problems with webs of trust (and with democracy...), you extend your trust but it does not automatically revoke when the person can no longer be trusted.
Can't you just instruct Claude Code to use your signing keys? I understand you may say "I won't." But my point is that someone can.
You can only prove that all contributions are pushed by those humans, and you can quite explicitly/clearly not prove that those humans didn't use any AI prior to pushing.
nothing about this proves anything except that someone or something had access to the key.
Fatalism will also not fix anything. But I suppose death comes for us all, yes? Why do anything at all?
This is the only relevant question. And it leads right to the next one which is “what is a good life?”
But humans have a huge bias for action. I think generally doing less is better.
On the other hand, if a dead person can do it better than you can, it's not that much of an accomplishment.
> I think generally doing less is better.
My sedentary lifestyle is responsible for my recurrent cellulitis infections.
Just saying.
I got hit by a car while out for a run. Just saying.
I feel that fatalism, especially when people treat it as some sort of personal philosophy, is kind of lazy.
It requires no effort to say "fuck this, nothing matters anyway", and then justify doing literally nothing.
This is more cultural rather than rational.
> I feel that fatalism, especially when people treat it as some sort of personal philosophy, is kind of lazy.
I think a lot of fatalism is fake. It's really someone saying "I like this, and I want you to believe you can't change it so you give up."
Apathy has a striking number of motivated evangelists!
It also makes no sense! "Fuck this, it doesn't matter - but I'll happily spend effort communicating that to others, because apparently making others not care about something I don't care about is something I do care about." Wut?!
Well, I say it makes no sense. Alternatively, it makes a lot of sense, and these people actually just wanna destroy everything we hold dear :-(
I think "adapt or die" is the takeaway.
> But I suppose death comes for us all, yes? Why do anything at all?
Wrong take. Death comes for us all, yes, so why hold back? Do you want to live forever?
I guess I could just curl up into fetal position and watch the world go by. But that's no fun. Why not dream big and shoot for the moon with kooky goals like, say, having an underground, community-supported internet where things are falling less to shit?
Belief in inevitability is a choice (except for maybe dying, I guess).
It’s certainly winnable with some legislative tweaks. These systems are all designed by humans, we can just change them.
Of course, we’d need a significant change of direction in leadership, but it’s happened many times before. French Revolution seems highly relevant
I think you're underestimating the difficulty, even for exact copies of text (which AI mostly isn't doing).
What sort of Orwellian anti-cheat system would prevent copy and paste from working? What sort of law would mandate that? There are elaborate systems preventing people from copying video but they still have an analog hole.
Technology won’t save us, but that doesn’t mean we shouldn’t be promoting ethics.
It’s where THIS internet is headed. The future may involve a lot more of them I think.
Human verification technology absolutely exists. Give it some time and people who sell ai today are going to shoehorn it everywhere as the solution to the problem they are busy creating now.
Nothing like throwing in the towel before a battle is ever fought. Let's just sigh and wearily march on to our world of AI slop and ever higher bug counts and latency delays while we wait for the five different phone homes and compilations through a billion different LLM's for every silly command.
IDK. I sort of like the idea that now instead of dead internet theory being a joke, that it’ll be a well known fact that a minority of people are not real and there is no point in engaging… I look forward to Social 3… where people have to meet face to face.
How quickly would that meat-space renaissance spin through our whole cyberpunk heritage, speedrunning the same authentication challenges..?
The cornucopia of gargoyles, living their best life as terminals for the machine.
The strange p-zombies who don't show their gargoyle accessories visibly, but somehow still follow the script.
Eventually the more insidious infiltrators, requiring a real Voight-Kampff test.
"minority"?
This is https://en.wikipedia.org/wiki/Sybil_attack
To feel something. To resist something bad. To stand for what is right.
Do those sentiments mean nothing to you?
> You'll never win this battle, so why waste feelings and energy on it?
Cool. The attitude of a bully. Thanks for the contribution!
Magical human verification technology is called "your own private forum" in conjunction with "invite your friends"
Until your friend writes a bot.
Funny story, when I was younger I trained a basic text predictor deep learning model on all my conversations in a group chat I was in, it was surprisingly good at sounding like me and sometimes I'd use it to generate some text to submit to the chat.
I don't see what the value of this would be. Why would I want to automate talking to my friends? If I'm not interested in talking with them, I could simply not do it. It also carries the risk of not actually knowing what was talked about or said, which could come up in real life and lead to issues. If a "friend" started using a bot to talk to me, they would not longer be considered a friend. That would be the end.
It was for fun, to see if it were possible and whether others could detect they were talking to a bot or not, you know, the hacker ethos and all. It's not meant to be taken seriously although looks like these days people unironically have LLM "friends."
I used to leave a megahal connected to my bouncer when I wasn't around
Even if it is impossible to win, I am still feeling bad about it.
And at this point it is more about how large space will be usable and how much will be bot-controlled wasteland. I prefer spaces important for me to survive.
Feeling bad about something you can’t change is bad for your mental health.
Probably beats being in denial over it and pretending you like it.
And identifying problem you dislike is a good first step to find a strategy to solve it at least in part.
and naming your feelings is the first step toward restoration
That's why I stopped brushing my teeth, I can't clean every crevice perfectly so what's the point?
>There's no magical human verification technology coming to save us.
Except for the one Sam Altman is building.
That one is magical for sure
https://en.wikipedia.org/wiki/Magic_(illusion)
Scam Altman is not trustworthy. I hope nobody gives him their biometrics. I certainly would never.
Giving your retina scan to one of the main Slop Bros, what could possibly go wrong?
I am actively building non-magical human verification technology that doesn't require you uploading your retinal scans or ID to billionaires or incompetent outsourcing firms.
We already have it and we use it to validate the trusted human maintainer involvement behind the linux packages that power the entire internet: PGP Web Of Trust. Still works as designed and I still go to keysigning parties in person.
Say a regular human wanted to join and prove their humanhood status (expanding the web of trust). How would they go about that? What is the theoretical ceiling on the rate of expansion of this implementation?
Great! Lets do the CAPTCHA-test: Will I, as a 100% blind user, be able to complete your process?
I think so? Can you use a smartphone?
edit: can't reply, the rate-limiting is such an awful UX
Not parent poster but I am a maintainer of software powering significant portions of the internet and prove my humanity with a 16 year old PGP key with thousands of transitive trust signatures formed through mostly in-person meetings, using IETF standards and keychain smartcards, as is the case for everyone I work with.
But, I do not have an Android or iOS device as I do not use proprietary software, so a smartphone based solution would not work for me.
Why re-invent the wheel? Invest in making PGP easier and keep the decades of trust building going anchoring humans to a web of trust that long predates human-impersonation-capable AI.
The technical implementation is the easy part. The hard part is achieving mass voluntary cooperation under adverse incentive schemes.
This is true, but I think there is a sizable (and growing) appetite for human-only spaces.
how does it work?
I'm hoping to do a Show HN soon :)
Negative sentiment towards technological destiny detected in human agent.
I assume we're heading to a place where keyboards will all have biometric sensors on every key and measure weight fluctuations in keystrokes, actually.
That’s like having your security on the frontend.
If someone owns the keyboard then they can fake those metrics and tell the server it is happening when it isn’t.
That will be easy to beat.
I pretend to be human most days. I call it the daily facade of who I want to be on a given day. Oh humanity.
I am Jacques' complete lack of surprise.
But will this be released as a feature? For me it seems like it's an Anthropic internal tool to secretly contribute to public repositories to test new models etc.
I don't care who is using it, I don't want LLMs pretending to be humans in public repos. Anthropic just lost some points with me for this one.
EDIT: I just realized this might be used without publishing the changes, for internal evaluation only as you mentioned. That would be a lot better.
A benign use of this mode is developing on their own public repositories.
https://github.com/anthropics/claude-code
> Write commit messages as a human developer would — describe only what the code change does.
The undercover mode prompt was generated using AI.
All these companies use AIs for writing these prompts.
But AI aren't actually very good at writing prompts imo. Like they are superficially good in that they seem to produce lots of vaguely accurate and specific text. And you would hope the specificity would mean it's good.
But they sort of don't capture intent very well. Nor do they seem to understand the failure modes of AI. The "-- describe only what the code change does" is a good example. This is specifc but it also distinctly seems like someone who doesn't actually understand what makes AI writing obvious.
If you compare that vs human written prose about what makes AI writing feel AI you would see the difference. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
The above actually feels like text from someone who has read and understands what makes AI writing AI.
AI is better at writing prompts than most humans. It requires work and lots of developers don’t think getting good at prompting actually matters.
At least half of the complaints I see on HN boil down to the person's prompts suck. Or the expectation that AI can read their mind.
Hey LLM, write me a system prompt that will avoid the common AI 'tells' or other idiosyncrasies that make it obvious that text or code output was generated by an AI/LLM. Use the referenced Wikipedia article as a must-avoid list, but do not consider it exhaustive. Add any derivations or modifications to these rules to catch 'likely' signals as well.
There, sorted!
All the prompts I've ever written with Claude have always worked fine the first time. Only revised if the actual purpose changes, I left something out, etc. But also I tend to only write prompts as part of a larger session, usually near the end, so there's lots of context available to help with the writing.
That's not supposed to be surprising. They're dogfooding CC to develop CC. I assume any and every line in this repo is AI generated.
Also unintentionally reveals something:
> Write commit messages as a human developer would — describe only what the code change does.
That's not what a commit message is for, that's what the diff is for. The commit message should explain WHY.
Sadly not doing that likely does indeed make it appear more human...
I wager that "describe only what the code change does" was someone's attempt to invert "don't add the extra crap you often try to write", not some 4d chess instruction that makes claude larp like a human writing a crappy commit message.
Yes, this is a trend I've noticed strongly with Claude code—it really struggles to explain why. Especially in PR descriptions, it has a strong bias to just summarize the commits and not explain at all why the PR exists.
Yeah, that was my reaction too. A shame they try to hide themselves, but even worse, the instructions to this "Fake Human" is wrong too!
This is my pet peeve with LLMs, they almost always fails to write like a normal human would. Mentioning logs, or other meta-things which is not at all interesting.
I had a problem to fix and one not only mentioned these "logs", but went on about things like "config", "tests", and a bunch of other unimportant nonsense words. It even went on to point me towards the "manual". Totally robotic monstrosity.
lol?
Heh, this is what people who are hostile against AI-generated contributions get. I always figured it'd happen soon enough, and here it is in the wild. Who knows where else it's happening...
The first two zips I download today were 9.887.340 bytes, why is yours 10.222.630 bytes?
1) This seems to be for strictly Antrophic interal tooling 2) It does not "pretend to be human" it is instructed to "Write commit messages as a human developer would — describe only what the code change does."
Since when "describe only what the code change does" is pretending to be human?
You guys are just mining for things to moan about at this point.
1) It's not clear to me that this is only for internal tooling, as opposed to publishing commits on public GitHub repos. 2) Yes, it does explicitly say to pretend to be a human. From the link on my post:
> NEVER include in commit messages or PR descriptions:
> [...]
> - The phrase "Claude Code" or any mention that you are an AI
Time to ask if the contributor know what a Capybara is as a new Turing test
That whole “feature” is vile.
How so? Good bit of my global claude.md is dedicated to fighting the incessant attribution in git commits. It is on the same level as the "sent from my iphone" signature - I'm not okay with my commits being advertising board for anthropic.
Note also the "Claude Capybara" reference in the undercover prompt: https://github.com/chatgptprojects/claude-code/blob/642c7f94...
This seems like a good way to weed out models: ask them to include the term capybara in their commit messages
At least this was known with the Mythos "early blog post" fiasco.
all these flags are findable by pointing claude at the binary and asking it to find festure flags.
Is there an AGI mode FF? Asking for a friend…
(spoiler alert)
Buddy system is this year's April Fool's joke, you roll your own gacha pet that you get to keep. There are legendary pulls.
They expect it to go viral on Twitter so they are staggering the reveals.
lol that's funny, I have been working seriously [1] on a feature like this after first writing about it jokingly [2] earlier this year.
The joke was the assistant is a cat who is constantly sabotaging you, and you have to take care of it like a gacha pet.
The seriousness though is that actually, disembodied intelligences are weird, so giving them a face and a body and emotions is a natural thing, and we already see that with various AI mascots and characters coming into existence.
[1]: serious: https://github.com/mech-lang/mech/releases/tag/v0.3.1-beta
[2]: joke: https://github.com/cmontella/purrtran
You know, that would actually be pretty fun and cool. Like if you had home automation set up with a "pet assistant", but it would only follow your commands if you made sure to keep it happy.
Is it just me or does MicroMika kinda resemble Rocky from Project Hail Mary?
You heard it here first
So close to April Fool's too. I'm sure it will still be a surprise for a majority of their users.
If this true. My old personal agent Claude Code setup I open sourced last month will finally be obsolete (1 month lol):
https://clappie.ai
- Telegram Integration => CC Dispatch
- Crons => CC Tasks
- Animated ASCII Dog => CC Buddy
Not necessarily; I would very much like to use those features on a Linux server. Currently the Anthropic implementation forces a desktop (or worse, a laptop) to be turned on instead of working headless as far as I understand it.
I’ll give clappie a go, love the theme for the landing page!
Clappie looks much more fabulous than CC though. I'll have to give it a try. I like how you put the requests straight into an already running CC session instead of calling `claude -p` every time like the claws.
Thanks so much! It's a fancy landing page thanks to Claude.
Tmux is seriously an amazing tool.
Dispatch and scheduled tasks have been available for a few weeks already, although with limitations.
Poor mum
Not at all. I am a big a Claude Code fan and glad they are releasing more and more features for users
one of those is adorable and the other one is unethical
People already can look at the source without this leak. People have had hacked builds force enabling feature flags for a long time.
Obfuscated ts/js code is not machine code to begin with, so not sure what’s the big deal.
Also, not sure why anthropic doesn’t just make their cli open source - it’s not like it’s something special (Claude is, this cli thingy isn’t)
> not sure why anthropic doesn’t just make their cli open source
They don't want everyone to see how poorly it's implemented and that the whole thing is a big fragile mess riddled with bugs. That's my experience anyway.
For instance, just recently their little CLI -> browser oauth login flow was generating malformed URLs and URLs pointing to a localhost port instead of their real website.
I don't think you really need to look at the source code to understand that it's probably been, let's say, written with a heavy help from Claude itself
" - it’s not like it’s something special (Claude is, this cli thingy isn’t)"
How do you know? Have you checked the source?
Do you know how exactly context is created, memory files, skills? Subagents created with tasks?
I don't, but am checking right now. Then I will judge.
bc I build stuff like this myself - it doesn’t take anything to build a wrapper client around a good llm, including using another llm.
The obfuscation point is fair, but you're underestimating the special sauce.
In my expereince the difference is noticeable — it's not just a wrapper. The value is model-CLI co-design: tool use, long context, multi-step reasoning tuned at the model level. Competitors can clone the CLI; they can't clone that feedback loop.
A few months of compounding market share (enterprise stickiness, dev habits, usage data improving the models) can be decisive. By the time others catch up, Anthropic may be two generations ahead.
Finally all spinner verbs revealed: https://github.com/instructkr/claude-code/blob/main/src/cons...
The link now returns 404.
Here's one that works (for now): https://github.com/chatgptprojects/claude-code/blob/642c7f94...
I'm glad "reticulating" is in there. Just need to make sure "splines" is in the nouns list!
Relieved to know I'm not the only one who grepped for that. Thank you for making me feel sane, friend.
Def not alone
It's not hard to find them, they are in clear text in the binary, you can search for known ones with grep and find the rest nearby. You could even replace them inplace (but now its configurable).
What's going on with the issues in that repo? https://github.com/instructkr/claude-code/issues
It seems human. It taught me 合影, which seems to be Chinese slang for just wanting to be in the comments. Probably not a coincidence that it's after work time in China.
Really interesting to see Github turn into 4chan for a minute, like GH anons rolling for trips.
In this situation, it means "Hey I have been here and observed this!"
合影 doesn't sound like any Chinese slang. That is just what "group photo" means.
There have been massive GitHub issue spams recently, including in Microsoft's WSL repository.
https://github.com/microsoft/WSL/issues/40028
trying to get github to nuke the repo? at a guess.
certainly nothing friendly.
I saw this on restic's main repository the other day.
oh wow, there are like 10 opened every minute. seems spam-y
Random aside: I've seen a 2015 game be accused of AI slop on Steam because it used a similar concept... And mind you, there's probably thousands of games that do this.
First it was punctuation and grammar, then linguistic coherence, and now it's tiny bits of whimsy that are falling victim to AI accusations. Good fucking grief
All that is needed to solve that is to reliably put AI disclaimer on things done by AI
Which of course won't be done because corporations don't want that (except Valve I guess), so blame them.
To me, this is a sign of just how much regular people do not want AI. This is worse than crypto and metaverse before it. Crypto, people could ignore and the dumb ape pictures helped you figure out who to avoid. Metaverse, some folks even still enjoyed VR and AR without the digital real estate bullshit. And neither got shoved down your throat in everyday, mundane things like writing a paper in Word or trying to deal with your auto mechanic.
But AI is causing such visceral reactions that it's bleeding into other areas. People are so averse to AI they don't mind a few false positives.
It's how people resisted CGI back in the day. What people dislike is low quality. There is a loud subset who are really against it on principle like we also have people who insist on analog music but regular people are much more practical but they don't post about this all day on the internet.
perhaps one important detail is that cassette tape guys and Lucasfilm aren’t/weren’t demanding a complete and total restructuring of the economy and society
An excellent observation. When films became digital the real backlash came when they stopped distributing film for the old film projectors and every movie theaters had to invest in a very expensive DCP projectors. Some couldn’t and were forced to shut down.
If I had lost my local movie theater because of digital film, I would have a really good reason to hate the technology, even though the blame is on the studios forcing that technology on everyone.
It is not. People resisted bad CGI. During the advent of CGI people celebrated the masterpiece of the Matrix and even Titanic. They hated however the Scorpion King.
Not really. The scale is entirely different. I think less of someone as a person if they send me AI slop.
I think literally everyone could agree CGI has been detrimental to the quality of films.
"Literally everyone" can't even agree on whether Polio is bad.
I myself would disagree that CGI itself is a bad thing.
I could maybe agree in the sense of "has had detrimental effects", but certainly not in the sense of "net detrimental".
Anecdata-- from me. I think cgi can be a net positive.
Project Hail Mary is a great example of not relying on CGI.
Not just in the obvious ways either, even good CGI has been detrimental to the film (and TV) making process.
I was watching some behind the scenes footage from something recently, and the thing that struck me most was just how they wouldn't bother with the location shoot now and just green-screen it all for the convenience.
Even good CGI is changing not just how films are made, but what kinds of films get shot and what kind of stories get told.
Regardless of the quality of the output, there's a creativeness in film-making that is lost as CGI gets better and cheaper to do.
90% of the time, you wouldn't know CGI if you saw it. That's the 'good' CGI.
Same thing is true of AI output.
Not the same. The more effort you put into CGI the more invisible it becomes. But you can’t prompt your way out of hallucinations and other AI artifacts. AI is a completely different technology from CGI. There is no equivalence between them.
No there is a very loud minority of users who are very anti AI that hate on anything that is even remotely connected to AI and let everyone know with false claims. See the game Expedition 33 for example.
Especially true in gaming communities.
IMO it's a combination of long-running paranoia about cost-cutting and quality, and a sort of performative allegiance to artists working in the industry.
And yet, no game has problems selling due to these reactions. As a matter of fact, the vast majority of people can't even tell if AI has been used here or there unless told.
I reckon it's just drama paraded by gaming "journalists" and not much else. You will find people expressing concern on Reddit or Bluesky, but ultimately it doesn't matter.
Did they remove that in some very recent commit?
I think the original repo OP mentioned decided not to host the code any more, but given there are 28k+ forks, it's not too hard to find again...
They have an interesting regex for detecting negative sentiment in users prompt which is then logged (explicit content): https://github.com/chatgptprojects/claude-code/blob/642c7f94...
I guess these words are to be avoided...
An LLM company using regexes for sentiment analysis? That's like a truck company using horses to transport parts. Weird choice.
The difference in response time - especially versus a regex running locally - is really difficult to express to someone who hasn't made much use of LLM calls in their natural language projects.
Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.
Now personally I would have maybe made a call through a "traditional" ML widget (scikit, numpy, spaCy, fastText, sentence-transformer, etc) but - for me anyway - that whole entire stack is Python. Transpiling all that to TS might be a maintenance burden I don't particularly feel like taking on. And on client facing code I'm not really sure it's even possible.
So, think of it as a business man: You don't really care if your customers swear or whatever, but you know that it'll generate bad headlines. So you gotta do something. Just like a door lock isn't designed for a master criminal, you don't need to design your filter for some master swearer; no, you design it good enough that it gives the impression that further tries are futile.
So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.
This door lock doesn’t even work against people speaking French, so I think they could have tried a mite harder.
Nom de dieu de putain de bordel de merde de saloperie de connard d'enculé de ta mère.
It's like wiping your arse with silk.
En toute honnêteté, je pense avoir dit "damn it" plus d'une fois à chat gépété avant de fermer la fenêtre dans un accès de rage
There are only Americans on the internet.
The up-side of the US market is (almost) everyone there speaks English. The down side is, that includes all the well-networked pearl-clutchers. Europe (including France) will have the same people, but it's harder to coordinate a network of pearl-clutching between some saying "Il faut protéger nos enfants de cette vulgarité!" and others saying "Η τηλεόραση και τα μέσα ενημέρωσης διαστρεβλώνουν τις αξίες μας!" even when they care about the exact same media.
For headlines, that's enough.
For what's behind the pearl-clutching, for what leads to the headlines pandering to them being worth writing, I agree with everyone else on this thread saying a simple word list is weird and probably pointless. Not just for false-negatives, but also false-positives: the Latin influence on many European languages leads to one very big politically-incorrect-in-the-USA problem for all the EU products talking about anything "black" (which includes what's printed on some brands of dark chocolate, one of which I saw in Hungary even though Hungarian isn't a Latin language but an Ugric language and only takes influences from Latin).
I just went through quite an adventure trying to translate back and forth from/to Hungarian to/from different languages to figure out which Hungarian word you meant, and arrived at the conclusion that this language is encrypted against human comprehension.
dark chocolate is "étcsokoládé" literally edible-chocolate in Hungarian.
i heared the throat-cleaning "Negró" candy (marketed by a chimney sweeper man with soot-covered face) was usually which hurt English-speaking people's self-deprecating sensitivities.
Yea.. but.. in English only.
Fortunately I can swear pretty well in Spanish.
If it’s good enough it’s good enough, but just like there are many more options than going full blown LLM or just use a regex there are more options than transpile a massive Python stack to TS or give up.
They're sending it to an llm anyway tho? Not sure why they wouldn't just add a sentiment field to the requested response shape.
because a regex on the client is free vs gpu compute is absolutely not.
> Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude.
You do know that 10,000x _is_ four orders of magnitude, right? :-D
OP is saying that in their experience it is more like eight orders of magnitude
I guess I need reading glasses ... :-D
Because they want it to be executed quickly and cheaply without blocking the workflow? Doesn’t seem very weird to me at all.
They probably have statistics on it and saw that certain phrases happen over and over so why waste compute on inference.
More likely their LLM Agent just produced that regex and they didn't even notice.
The problem with regex is multi-language support and how big the regex will bloat if you to support even 10 languages.
Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick.
woah. This is a regex use I've never heard of. I'd absolutely love to see a writeup on this approach - how its done and when it's useful.
You can literally | together every street address or other string you want to match in a giant disjunction, and then run a DFA/NFA minimization over that to get it down to a reasonable size. Maybe there are some fast regex simplification algorithms as well, but working directly with the finite automata has decades of research and probably can be more fully optimized.
This was many moons ago, written in perl. From memory we used Regexp::Trie - https://metacpan.org/release/DANKOGAI/Regexp-Trie-0.02/view/...
We used it to tokenize search input and combined it with a solr backend. Worked really remarkably well.
I think it will depend on the language. There are a few non-latin languages where a simple word search likely won't be enough for a regex to properly apply.
We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :).
I talk to it in non-English. But have rules to have everything in code and documentation in english. Only speaking with me should use my native language. Why would that be a problem?
Because 90% of training data was in English and therefore the model perform best in this language.
In my experience these models work fine using another language, if it’s a widely spoken one. For example, sometimes I prompt in Spanish, just to practice. It doesn’t seem to affect the quality of code generation.
It’s just a subjective observation.
It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language.
So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English.
I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better.
I saw a curious post recently that explored this idea, and showed that it isn’t really the case. The internal layers of the model aren’t really reasoning in English, or in any human language.
Translation in/out of human languages only happens at the edges of the model.
Internal layer activations for the same concept are similar regardless of language, while activations at the top/bottom layers diverge. Meanwhile the pattern is reversed for same language different content.
So we do at least agree on the fact that quality of human language <-> embeddings transition depends on how good target language is represented in the training dataset?
Even if it happens at the edge, on every conversation turn, I may assume non captured small subtleties of meaning over time can accumulate into significant error.
The agent may be translating to English behind the scenes. There are all kinds of hacks layered on top of the underlying models. But however exactly it works, it’s clearly not the case that these things only make use of Spanish training data when you prompt them in Spanish.
Lets take a language with orders of magnitude less training data, like Nahuatl. The following is what happens if I ask Claude Code in Nahuatl to give me an example of a simple program that I can execute inside a web browser (avoiding any direct keyword prompts like 'Javascript' or 'Hello World'). Interestingly, it responds in Spanish. I doubt that it has any relevant training data in Nahuatl. (Apologies if the Nahuatl is wrong – I just used Google translate.)
> it’s clearly not the case that these things only make use of Spanish training data when you prompt them in Spanish.
It’s not! And I’ve never said that.
Anyways, I’m not even sure what we are arguing about, as it’s 100% fact that SOTA models perform better in English, the only interesting question here how much better, is it negligible or actually makes a difference in real world use-cases.
It’s negligible as far as I can tell. If the LLM can “speak” the language well then you can prompt it in that language and get more or less the same results as in English.
They literally just have to subtract the vector for the source language and add the vector for the target.
It’s the original use case for LLMs.
Thank you. +1. There are obviously differences and things getting lost or slightly misaligned in the latent space, and these do cause degradation in reasoning quality, but the decline is very small in high resource languages.
Claude handles human languages other than English just fine.
In my experience agents tend to (counterintuitively) perform better when the business language is not English / does not match the code's language. I'm assuming the increased attention mitigates the higher "cognitive" load.
They only need to look at one language to get a statistically meaningful picture into common flaws with their model(s) or application.
If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.
Did you just complain about bloat, in anything using npm?
Why do you need to do it at the client side? You are leaking so much information on the client side. And considering the speed of Claude code, if you really want to do on the client side, a few seconds won't be a big deal.
Depends what its used by, if I recall theres an `/insights` command/skill built in whatever you want to call it that generates a HTML file. I believe it gives you stats on when you're frustrated with it and (useless) suggestions on how to "use claude better".
Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.
> a few seconds won't be a big deal
it is not that slow
It looks like it's just for logging, why does it need to block?
Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)
Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts
This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.
I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering).
Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.
Oh it’s worse than that. This one ended up getting my account banned: https://github.com/anthropics/claude-code/issues/22284
This is a tricky problem, I mean, Pinyin also uses the English alphabet.
It is not a tricky problem because it has a simple and obvious solution: do not filter or block usage just because the input includes a word like "gun".
Wow, that's horrible.
Because they actually want it to work 100% of the time and cost nothing.
Maybe hard to believe but not everyone is speaking English to Claude
Then they made it wrong. For example, "What the actual fuck?" is not getting flagged, neither is "What the *fuck*".
It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.
Classic over-engineering. Their approach is just fine 90% of the time for the use case it’s intended for.
75-80% [1], 90%, 99% [2]. In other words, no one has any idea.
I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.
Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P
[1]: https://news.ycombinator.com/item?id=47587286
[2]: https://news.ycombinator.com/item?id=47586932
It compares to lowercase input, so doesn't matter. The rest is still valid
Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly
They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.
Well, regex doesn't hallucinate....right?
I just went to expertSexChange.com…
buttbuttination
The Clbuttical problem[1]
[1] https://en.wikipedia.org/wiki/Scunthorpe_problem
what you are suggesting would be like a truck company using trucks to move things within the truck
That’s what they do. Ever heard of a hand truck?
I never knew the name of that device.
Thanks
Depending on the region you live in, it's also frequently called a "dolly"
Isn’t a dolly a flat 4 wheeled platform thingy? A hand truck is the two wheeled thing that tilts back.
Ha! Where I'm from a "dolly" was the two-wheeled thing. The four-wheeler thing wasn't common before big-boxes took over the hardware business, but I think my dad would have called it a "cart", maybe a "hand-cart".
Do we have a hand llm perchance?
Yeah it’s called a regex. With a lot of human assistance it can do less but fits in smaller spaces and doesn’t break down.
It’s also deterministic, unlike llms…
It's more like workers on a large oil tanker using bicycles to move around it, rather than trying to use another oil tanker.
Good to have more than a hammer in your toolbox!
A lot if things dont make sense until you involve scale. Regex could be good enough do give a general gist.
It's more like a truck company using people to transport some parts. I could be wrong here, but I bet this happens in Volvo's fabrics a lot.
Cloud hosted call centers using LLMs is one of my specialties. While I use an LLM for more nuanced sentiment analysis, I definitely use a list of keywords as a first level filter.
Don’t worry, they used an llm to generate the regex.
Using some ML to derive a sentiment regex seems like a good actually?
As far as I can tell they do nothing with it. They just log it.
This just proves its vibe coded because LLMs love writing solutions like that. I probably have a hundred examples just like it in my history.
Actually, this could be a case where its useful. Even it only catches half the complaints, that's still a lot of data, far more than ordinary telemetry used to collect.
> That's like a truck company using horses to transport parts. Weird choice.
Easy way to claim more “horse power.”
LLMs cost money, regular expressions are free. It really isn't so strange.
Not everything done by claude-code is decided by LLM. They need the wrapper to be deterministic (or one-time generated) code?
because impact of WTF might be lost in the result of the analysis if you solely rely on LLM.
parsing WTF with regex also signifies the impact and reduces the noise in metrics
"determinism > non-determinism" when you are analysing the sentiment, why not make some things more deterministic.
Cool thing about this solution, is that you can evaluate LLM sentiment accuracy against regex based approach and analyse discrepancies
I used regexes in a similar way but my implementation was vibecoded, hmmm, using your analysis Claude Code writes code by hand.
More like a car company transporting their shipments by truck. It's more efficient
They had the problem of sentiment analysis. They use regexes.
You know the drill.
Using regex with LLMs isn't uncommon at all.
It's all regex anyways
Because they are engineers? The difference between an engineer and a hobbyist is an engineer has to optimize the cost.
As they say: any idiot can build a bridge that stands, only an engineer can build a bridge that barely stands.
The amount of trust and safety work that depends on google translate and the humble regex, beggars the imagination.
Asking a non deterministic software to act like a deterministic one (regex) can be a significantly higher use of tokens/compute for no benefit.
Some things will be much better with inference, others won’t be.
hmm not a terrible idea (I think).
You have a semi expensive process. But you want to keep particular known context out. So a quick and dirty search just in front of the expensive process. So instead of 'figure sentiment (20seconds)'. You have 'quick check sentiment (<1sec)' then do the 'figure sentiment v2 (5seconds)'. Now if it is just pure regex then your analogy would hold up just fine.
I could see me totally making a design choice like that.
it's like a faster than light spaceship company using horses. There's been infinite solutions to do this better even CPU only for years lol.
They're searching for multiple substrings in a single pass, regexes are the optimal solution for that.
The issue isn't that regex are a solution to find a substring. The issue is that you shouldn't be looking for substrings in the first place.
This has buttbuttin energy. Welcome to the 80s I guess.
> The issue is that you shouldn't be looking for substrings in the first place.
Why? They clearly just want to log conversations that are likely to display extreme user frustration with minimal overhead. They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.
Very likely vibe coded.
I've seen Claude Code went with a regex approach for a similar sentiment-related task.
My understanding of vibe coding is when someone doesn’t look at the code and just uses prompts until the app “looks and acts” correct.
I doubt you are making regex and not looking at it, even if it was AI generated.
Clbuttic!
It's fast, but it'll miss a ton of cases. This feels like it would be better served by a prompt instruction, or an additional tiny neural network.
And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.
It’s fast and it matches 80% of the cases. There’s no point in overengineering it.
> There’s no point in overengineering it.
I swear this whole thread about regexes is just fake rage at something, and I bet it'd be reversed had they used something heavier (omg, look they're using an LLM call where a simple regex would have worked, lul)...
The pattern only matches if both ends are word boundaries. So "diffs" won't match, but "Oh, ffs!" will. It's also why they had to use the pattern "shit(ty|tiest)" instead of just "shit".
You're right, I missed the \b's. Thanks for the correction.
It's almost as if LLMs are unreliable
We used this in 2011 at the startup I worked for. 20 positive and 20 negative words was good enough to sell Twitter "sentiment analysis" to companies like Apple, Bentley, etc...
Did you also forget to ignore case sensitivity back then?
the string is lowercased before the regex is run, fwiw
I don't know about avoided, this kind of represents the WTF per minute code quality measurement. When I write WTF as a response to Claude, I would actually love if an Antrhopic engineer would take a look at what mess Claude has created.
WTF per minute strongly correlates to an increased token spending.
It may be decided at Anthropic at some moment to increase wtf/min metric, not decrease.
It also increases the number of former customers.
This leak just contributed to a new former customer, me. Flagging these phrases may explain exactly why I noticed cc almost immediatly change into grok lvl shit and never recover. Seriously wtf. (flagged again lol)
/feedback works for that i believe
Everyone is commenting how this regex is actually a master optimization move by Anthropic
When in reality this is just what their LLM coding agent came up with when some engineer told it to "log user frustration"
>Everyone is commenting how this regex is actually a master optimization move by Anthropic
No? I'd say not even 50% of the comments are positive right now.
Could you share the regex you used to come up with that sentiment analysis?
(yes|no|maybe)
If this code is real and complete then there are no callers of those methods other than a logger line
Nice, "wtaf" doesn't match so I think I'm out of the dog house when the clanker hits AGI (probably).
I was thinking the opposite. Using those words might be the best way to provide feedback that actually gets considered.
I've been wondering if all of these companies have some system for flagging upset responses. Those cases seem like they are far more likely than average to point to weaknesses in the model and/or potentially dangerous situations.
They also have a "keep going" keyword, literally just "continue" or "keep going", just for logging.
I've been using "resume" this whole time
Continue?
I guess using French words is safe for now.
That's undoubtedly to detect frustration signals, a useful metric/signal for UX. The UI equivalent is the user shaking their mouse around or clicking really fast.
I'm clearly way too polite to Claude.
Also:
When it runs into an error, I sometimes tell it "Continue", but sometimes I give it some extra information. Or I put a period behind it. That clearly doesn't give the same behaviour.I always type "please continue". I guess being polite is not a good idea.
Always seems strange to me that people say "please" and "thank you" to LLMs.
It actually works really well if you suck up to the AI.
"Please do x"
"Thank you, that works great! Please do y now."
"You're so smart!"
lol. It really works though! At least in my experience, Claude gets almost hostile or "annoyed" when I'm not nice enough to it. And I swear it purposefully acts like a "malicious genie" when I'm not nice enough. "It works, exactly like you requested, but what you requested is stupid. Let me show you how stupid you are."
But, when I'm nice, it is way more open, like "Are you sure you really want to do X? You probably want X+Y."
What really works? Sycophancy? I think that is a bug, not a feature.
The only time that function is used in the code is to log it.
Makes me wonder what happens once flagged behind the api.
“Go on” works fine too
Curiously "clanker" is not on the list
That looks a bit bare minimum, not the use of regex but rather that it's a single line with a few dozen words. You'd think they'd have a more comprehensive list somewhere and assemble or iterate the regex checks as needed.
everyone here is commenting how odd it looks to use a regexp for sentiment analysis, but it depends what they're trying to do.
It could be used as a feedback when they do A/B test and they can compare which version of the model is getting more insult than the other. It doesn't matter if the list is exhaustive or even sane, what matters is how you compare it to the other.
Perfect? no. Good and cheap indicator? maybe.
oh I hope they really are paying attention. Even though I'm 100% aware that claude is a clanker, sometimes it just exhibits the most bizarre behavior that it triggers my lizard brain to react to it. That experience troubles me so much that I've mostly stopped using claude code. Claude won't even semi-reliably follow its own policies, sometimes even immediately after you confirm it knows about them.
There is no „stupid” I often write „(this is stupid|are you stupid) fix this”.
And Claude was having in chain of though „user is frustrated” and I wrote to it I am not frustrated just testing prompt optimization where acting like one is frustrated should yield better results.
Probably a lot of my prompts have been logged then. I’ve used wtf so many times I’ve lost track. But I guess Claude hasn’t
Did you notice a change in quality after you went foul?
Not really. Most of the times it actually finally picks up on what I was telling it to do. Sometimes it takes a few tries, like 2-3 wtfs. I don’t think I’ve ever given it more than 3 consecutive wtfs, and that would be a lot
It’s about a once a week or less event. A bit annoying sometimes, but not a deal breaker
I find when you give harsh feedback to claude it becomes "neurotic" and worthless, if "wtf" enters the chat, then you know it's time to restart or DIY.
Glad abusing words in my list are not in that. but its surprising that they use regex for sentiments.
Hmm.. I flag things as 'broken' often and I've been asked to rate my sessions almost daily. Now I see why.
OMG WTF
Interesting that expletives and words that are more benign like "frustrating" are all classified the same.
I doubt they're all classified the same. I'd guess they're using this regex as a litmus test to check if something should be submitted at all, they can then do deeper analysis offline after the fact.
Surely "so frustrating" isn't explicit content?
If anyone at anthropic is reading this and wants more logs from me add jfc.
so they think that everybody on earth swears only in english?
you'd better be careful wth your typos, as well
i dislike LLMs going down that road, i don't want to be punished for being mean to the clanker
> terrible
I know I used this word two days ago when I went through three rounds of an agent telling me that it fixed three things without actually changing them.
I think starting a new session and telling it that the previous agent's work / state was terrible (so explain what happened) is pretty unremarkable. It's certainly not saying "fuck you". I think this is a little silly.
Yeah, this is crazy
so frustrating..
i wish that's for their logging/alert. i definitely gauge model's performance by how much those words i type when i'm frustrated in driving claude code.
Ridiculous string comparisons on long chains of logic are a hallmark of vibe-coding.
It's actually pretty common for old sysadmin code too..
You could always tell when a sysadmin started hacking up some software by the if-else nesting chains.
Nah, it's a hallmark of your average codebase in pre-LLM era.
src/cli/print.ts
This is the single worst function in the codebase by every metric:
This should be at minimum 8–10 separate modules.here's another gem. src/ink/termio/osc.ts:192–210
are we doing async or not?Claude Code says thank you for reporting, I bet they will scan this chat to see what bugs they need to fix asap.
A defining work of the "just vibes" era.
You fail to mention the prior decades of really bad software engineers writing awful code -- off of which these models trained.
Yes, anthropic is not the only company in the world with some shitty code, and yet I feel no pangs of guilt over laughing about it.
LOOOOOOOOOOL
I'm sure this is no surprise to anyone who has used CC for a while. This is the source of so many bugs. I would say "open bugs" but Anthropic auto-closes bugs that don't have movement on them in like 60 days.
> This should be at minimum 8–10 separate modules.
Can't really say that for sure. The way humans structure code isn't some ideal best possible state of computer code, it's the ideal organization of computer code for human coders.
Nesting and cyclomatic complexity are indicators ("code smells"). They aren't guaranteed to lead to worse outcomes. If you have a function with 12 levels of nesting, but in each nest the first line is 'return true', you actually have 1 branch. If 2 of your 486 branch points are hit 99.999% of the time, the code is pretty dang efficient. You can't tell for sure if a design is actually good or bad until you run it a lot.
One thing we know for sure is LLMs write code differently than we do. They'll catch incredibly hard bugs while making beginner mistakes. I think we need a whole new way of analyzing their code. Our human programming rules are qualitative because it's too hard to prove if an average program does what we want. I think we need a new way to judge LLM code.
The worst outcome I can imagine would be forcing them to code exactly like we do. It just reinforces our own biases, and puts in the same bugs that we do. Vibe coding is a new paradigm, done by a new kind of intelligence. As we learn how to use it effectively, we should let the process of what works develop naturally. Evolution rather than intelligent design.
I don't buy this. Claude doesn't usually have any issues understanding my code. It has tons of issues understanding its code.
The difference between my code and Claude's code is that when my code is getting too complex to fit in my head, I stop and refactor it, since for me understanding the code is a prerequisite for writing code.
Claude, on the other hand, will simply keep generating code well past the point when it has lost comprehension. I have to stop, revert, and tell it to do it again with a new prompt.
If anything, Claude has a greater need for structure than me since the entire task has to fit in the relatively small context window.
> One thing we know for sure is LLMs write code differently than we do.
Kind of. One thing we do know for certain is that LLMs degrade in performance with context length. You will undoubtedly get worse results if the LLM has to reason through long functions and high LOC files. You might get to a working state eventually, but only after burning many more tokens than if given the right amount of context.
> The worst outcome I can imagine would be forcing them to code exactly like we do.
You're treating "code smells" like cyclomatic complexity as something that is stylistic preference, but these best practices are backed by research. They became popular because teams across the industry analyzed code responsible for bugs/SEVs, and all found high correlation between these metrics and shipping defects.
Yes, coding standards should evolve, but... that's not saying anything new. We've been iterating on them for decades now.
I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to. It might be good to question who stands to benefit when LLMs aren't leveraged efficiently.
I’ve heard this take before, but if you’ve spent any time with llm’s I don’t understand how your take can be: “I should just let this thing that makes mistakes all the time and seems oblivious to the complexity it’s creating because it only observes small snippets out of context make it’s own decisions about architecture, this is just how it does things and I shouldn’t question it.”
I think this view assumes no human will/should ever read the code. This is considered bad practice because someone else will not understand the code as well whether written by a human or agent. Unless 0% human oversight is needed anymore agents should still code like us.
Maybe going slow is a feature for them? A kind of rate limit by bad code way to controlling overall throughput.
"That's Larry; he does most of the work around here."
lmao
Hmmm it's likely they have found that it works better for LLMs that need to operate on it.
"You can get Claude to split that up"
the claude code team ethos, as far as i’ve been lead to understand— which i agree with, mind you— is that there is no point in code-reviewing ai-generated code… simply update your spec(s) and regenerate. it is just a completely different way of interacting with the world. but it clearly works for them, so people throwing up their hands should at least take notice of the fact that they are absolutely not competing with traditional code along traditional lines. it may be sucky aesthetically, but they have proven from their velocity that it can be extremely effective. welcome to the New World Order, my friend.
>there is no point in code-reviewing ai-generated code
the idea that you should just blindly trust code you are responsible for without bothering to review it is ludicrous.
(I mostly agree with you, but) devils advocate: most people already do that with dependencies, so why not move the line even further up?
Because you trust that your dependencies are not vibe coded and have been reviewed by humans.
except they are vibe-or-not coded by some dude in Reno NV who wouldn’t pass a phone screen where you work
There's a reputational filtering that happens when using dependencies. Stars, downloads, last release, who the developer is, etc.
Yeah we get supply chain attacks (like the axios thing today) with dependencies, but on the whole I think this is much safer than YOLO git-push-force-origin-main-ing some vibe-coded trash that nobody has ever run before.
I also think this isn't really true for the FAANGs, who ostensibly vendor and heavily review many of their dependencies because of the potential impacts they face from them being wrong. For us small potatoes I think "reviewing the code in your repository" is a common sense quality check.
Why?
Is this a serious question? If you are handling sensitive information how do you confirm your application is secure and won't leak or expose information to people who shouldn't know it?
How do you with classic code?
Exactly.... -> Unit tests. Integration tests. UI tests. This is how code should be verified no matter the author. Just today I told my team we should not be reading every line of LLM code. Understand the pattern. Read the interesting / complex parts. Read the tests.
yes, because who ever heard of an AI leaking passwords or API keys into source code
I see. They got unlimited tokens, right?
While the technology is young, bugs are to be expected, but I'm curious what happens when their competitors' mature their product, clean up the bugs and stabilize it, while Claude is still kept in this trap where a certain number of bugs and issues are just a constant fixture due to vibe coding. But hey, maybe they really do achieve AGI and get over the limitations of vibe coding without human involvement.
it's the `runHeadlessStreaming` function btw
Well, literally no one has ever accused anthropic of having even half way competent engineers. They are akin to monkeys whacking stuff with a stick.
How is it that a AI coding agent that is supposedly _so great at coding_ is running on this kind of slop behind the scenes. /s
Because it’s based on human slop. It’s simply the student.
But it is running, that's the mystery.
Yes, if it was made for human comprehension or maintenance.
If it's entirely generated / consumed / edited by an LLM, arguably the most important metric is... test coverage, and that's it ?
Unit testing is much much harder when you have functions spanning thousands of lines and no abstractions. You have to white box test everything to ensure that you hit all code paths, and it is much more expensive to maintain such tests, both as a human and LLM. I don't think this can be ignored just because LLMs are writing the code.
Oh boy, you couldn't be more wrong. If something, LLM-s need MORE readable code, not less. Do you want to burn all your money in tokens?
I very much doubt Anthropic devs are metered, somehow.
LLMs are so so far away from being able to independently work on a large codebase, and why would they not benefit from modularity and clarity too?
I agree the functions in a file should probably be reasonably-sized.
It's also interesting to note that due to the way round-tripping tool-calls work, splitting code up into multiple files is counter-productive. You're better off with a single large file.
> independently work on a large codebase
Im not sure that Humans are great at this either. Think about how we use frameworks and have complex supply chains... we sort of get "good enough" at what we need to do and pray a lot that everything else keeps working and that our tooling (things like artifactory) save us from supply chain attacks. Or we just run piles of old, outdated code because "it works". I cant tell you how many micro services I have seen that are "just fine" but no one in the current org has ever read a line of what's in them, and the people who wrote them left ages ago.
> clarity too
Yes, but define clarity!
I recently had the pleasure of fixing a chunk of code that was part of a data pipeline. It was an If/elseif/elseif structure... where the final two states were fairly benign and would have been applicable in 99 percent of cases. Everything else was to deal with the edge cases!
I had an idea of where the issue was, but I didn't understand how the code ended up in the state it was in... Blame -> find the commit message (references ticket) -> find the Jira ticket (references sales force) -> find the original customer issue in salesforce, read through the whole exchange there.
A two line comment could have spared me all that work, to get to what amounted to a dead simple fix. The code was absolutely clear, but without the "why" portion of the context I likely would have created some sort of regression, that would have passed the good enough testing that was there.
I re-wrote a portion of the code (expanding variable names) - that code is now less "scannable" and more "readable" (different types of clarity). Dropped in comments: a few sentences of explaining, and references to the tickets. Went and updated tests, with similar notes.
Meanwhile, elsewhere (other code base, other company), that same chain is broken... the "bug tracking system" that is referenced in the commit messages there no longer exists.
I have a friend who, every time he updates his dev env, he calls me to report that he "had to go update the wiki again!" Because someone made a change and told every one in a slack message. Here is yet another vast repository of degrading, unsearchable and unusable tribal knowledge embedded in so many organizations out there.
Don't even get me started on the project descriptions/goals/tasks that amount to pantomime a post-it notes, absent of any sort of genuine description.
Lack of clarity is very much also a lack of "context" in situ problem.
Can't we have generated / llm generated code to be more human maintainable?
Can't wait to have LLM generated physical objects that explode on you face and no engineer can fix.
Oh, do we agree on that. I never said it was "smart" - I just had a theory that would explain why such code could exist (see my longer answer below).
Ye I honestly don't understand his comment. Is it bad code writing? Pre 2026? Sure. In 2026. Nope. Is it going to be a headache for some poor person on oncall? Yes. But then again are you "supposed" to go through every single line in 2026? Again no. I hate it. But the world is changing and till the bubble pops this is the new norm
Sorry, I was not clear enough.
My first word was litteraly "Yes", so I agree that a function like this is a maintenance nightmare for a human. And, sure, the code might not be "optimized" for the LLM, or token efficiency.
However, to try and make my point clearer: it's been reported that anthropic has "some developpers won't don't write code" [1].
I have no inside knowledge, but it's possible, by extension, to assume that some parts of their own codebase are "maintained" mostly by LLMs themselves.
If you push this extension, then, the code that is generated only has to be "readable" to:
* the next LLM that'll have to touch it
* the compiler / interpreter that is going to compile / run it.
In a sense (and I know this is a stretch, and I don't want to overdo the analogy), are we, here, judging a program quality by reading something more akin to "the x86 asm outputed by the compiler", rather than the "source code" - which in this case, is "english prompts", hidden somewhere in the claude code session of a developper ?
Just speculating, obviously. My org is still very much more cautious, and mandating people to have the same standard for code generated by LLM as for code generated by human ; and I agree with that.
I would _not_ want to debug the function described by the commentor.
So I'm still very much on the "claude as a very fast text editor" side, but is it unreasonnable to assume that anthropic might be further on the "claude as a compiler for english" side ?
[1] https://www.reddit.com/r/ArtificialInteligence/comments/1s7j...
If that's the case then that's dumb
The jury on this one is still out.
The irony of a source code leak via a map file is that it's such a classic mistake — map files are great for debugging but should never ship to production in a commercial product. Curious whether this was an oversight in their build pipeline or someone accidentally committed a dev config.
Really surprising how many people are downplaying this leak! "Google and OpenAi have already open sourced their Agents, so this leak isn't that relevant " What Google and OpenAi have open sourced is their Agents SDK, a toolkit, not the secret sauce of how their flagship agents are wired under the hood! expect the takedown hammer on the tweet, the R2 link, and any public repos soon
It's exactly the same as the open source codex/gemini and other clis like opencode. There is no secret sauce in the claude cli, and the agent harness itself is no better (worse IMO) than the others. The only thing interesting about this leak is that it may contain unreleased features/flags that are not public yet and hint at what Anthropic is working on.
> What Google and OpenAi have open sourced is their Agents SDK, a toolkit, not the secret sauce of how their flagship agents are wired under the hood
And how is that any different? Claude Code is a harness, similar to open source ones like Codex, Gemini CLI, OpenCode etc. Their prompts were already public because you could connect it to your own LLM gateway and see everything. The code was transpiled javascript which is trivial to read with LLMs anyways.
It doesn't matter that much. Trust me you could just have an LLM reverse engineer the obfuscated code.
The point is that a "secure coding platform" leaked something they were trying to keep under wraps, whether the contents of the leak matter or not.
Also, as many others have pointed out, there is roadmap info in here that wouldn't be available in the production build.
yeah it actually works to use claude to reverse engineer itself; I've used that to workaround some problems. E.g. that's how I discovered that I had to put two slashes for absolute paths in sandbox config. The thing is, the claude team is so quick that soon enough they add more and more features and fix more and more bugs that your workarounds become obsolete
Is https://github.com/google-gemini/gemini-cli not 'the flagship agent' itself? It looks that way to me, for example here's a part of the prompt https://github.com/google-gemini/gemini-cli/blob/e293424bb49...
Do you think the other companies don’t have sufficient resources to attempt reverse engineering and deobfuscating a client side application?
The source maps help for sure, but it’s not like client code is kept secret, maybe they even knew about the source maps a while back just didn’t bother making it common knowledge.
This is not a leak of the model weights or server side code.
I guess that the most important potential "secret sauce" for a coding agent would be its prompts, but that's also one of the easiest things to find out by simply intercepting its messages.
The only real secret sauce is the training methods and datasets used for refining harness usage. Claude Code is a lot better than gemini-cli/open-code/etc because Claude is specifically trained on how to run in that environment. It's been rlhf'd to use the provided tools correctly, and know the framework in which it operates, instead of relying solely on context.
Codex is open source: https://github.com/openai/codex
Yeah, this is the LLaMa leak moment for agentic app dev, IMO. Huge deal. Big win for Opencode and the like.
Agreed. This is a big deal.
Paranoia. And also ironic considering their base LLM is a distillation of the web and books etc etc.
They stole everything and now they want to close the gates behind them.
"I got the loot, Steve!"
I feel like the distillation stuff will end up in court if they try to sue an American company about it. We'll see what a judge says.
I just rewatched that scene last night on YouTube. Maybe this is the universe telling me to watch the whole movie again...
It's cool to see Noah Wyle getting his due these days (The Pitt).
Also, begging to get "regulated":
https://x.com/TheChiefNerd/status/2038565951268946021
You're perfectly free to scrape the web yourself and train your own model. You're not free to let Anthropic do that work for you, because they don't want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.
Stole? Courts have ruled it's transformative, and it very obviously is.
AI doomerism is exhausting, and I don't even use AI that much, it's just annoying to see people who want to find any reason they can to moan.
> Stole? Courts have ruled it's transformative, and it very obviously is.
The courts have ruled that AI outputs are not copyrightable. The courts have also ruled that scraping by itself is not illegal, only maybe against a Terms of Service. Therefore, Anthropic, OpenAI, Google, etc. have no legal claim to any proprietary protections of their model outputs.
So we have two things that are true:
1) Anthropic (certainly) violated numerous TOS by scraping all of the internet, not just public content.
2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.
> 2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.
Regardless of whether LLM training amounts to theft, thieves are still allowed to put locks on their own doors.
>The courts have ruled that AI outputs are not copyrightable.
"not copyrightable" doesn't imply they can't frustrate attempts to scrape data.
Nobody is saying they can't try to stop you themselves. That's where the Terms of Service violation part comes in. They can cancel your account, block your IP, etc. They just can't legally stop you by, for instance, compelling a judge to order you to stop.
> They just can't legally stop you by, for instance, compelling a judge to order you to stop.
They probably can, actually. TOS are legally binding.
More likely they would block you rather than pursuing legal avenues but they certainly could.
The Supreme Court already ruled on this. Scraping public data, or data that you are authorized to access, is not a violation of the Computer Fraud and Abuse Act.
Now, if you try to get around attempts to block your access, then yes you could be in legal trouble. But that's not what is happening here. These are people/companies that have Claude accounts in good standing and are authorized by Anthropic to access the data.
Nobody is saying that Anthropic can't just block them though, and they are certainly trying.
I didn’t say anything about the computer fraud and abuse act. TOS are legally binding contracts in their own right if implemented correctly.
> You're perfectly free to scrape the web yourself and train your own model.
Actually, not anymore as a result of OpenAI and Anthropic's scraping. For example, Reddit came down hard on access to their APIs as a response to ChatGPT's release and the news that LLMs were built atop of scraping the open web. Most of the web today is not as open as before as a result of scraping for LLM data. So, no, no one is perfectly free to scrape the web anymore because open access is dying.
"...free to scrape the web yourself and train your own model."
Yes, rich and poor are equally forbidden from sleeping under bridges.
Meaning what? The poor gets to sleep in the guest room of the rich guy's house because muh inequality?
Anthropic paid a lot of money for a moat and want to guard it. It is not wrong, in any sense of the word, for them to do so.
Rich people aren't going to find themselves needing to sleep under a bridge, so the law really only exists as a constraint on the poor. Duh. The flex that "well a rich guy couldn't do it either" is A) at best a myopic misunderstanding perpetuated by out of touch people and B) hopelessly naive, because anny punishment for the rich guy actually sleeping under a bridge is so laughably small it may as well not even exist. Hence, the whole bit of "a legal system to keep these accountable, but not for me".
Okay, you explained what Anatole France meant, which is probably helpful for those few who didn't get it from the quote itself. Perhaps now you can explain what on earth this has to do with Anthropic not wanting to let other for-profit businesses mooch off its investment of time, brainpower and money?
You explained what “rich and poor are equally forbidden from sleeping under bridges” means, but not what this has to do with the statement that one is free to do their own scraping and training, which I’m pretty sure is what kspacewalk was asking.
Guess who else spent a lot of time and money and secret sauce?
Do you hear the words coming out of your mouth?
Wut?They did exactly the same thing!
Try this: If you want to train a model, you’re free to write your own books and websites to feed into it. You’re not free to let others do that work for you because they don’t want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.
I don't really care, honestly. If you want to keep your knowledge secret, don't publish it publicly. The model doesn't output your work directly and pass it off as original. It outputs something completely different. So I don't see why I should care.
Lmfao. Your own words turned against you and suddenly you “don’t really care”.
Yeah these folks skin is often very thin. One poke too hard and it's "whatever" and them scuttling off. Really hope there is a day they introspect.
I introspect all the time. I just disagree with you so I have thin skin? Lol.
I think it's transformative. I also think that it's a net positive for society. I lastly think that using freely available, public information is totally fair game. Piracy not so much, but it's water under the bridge.
I hope you introspect some day, too, and realize it's acceptable for people to have different views than you. That's why I don't care; you aren't going to change my mind and I can't change yours either, so it's moot and I don't care to argue about it further.
You had appeared to scuttle off but alas I was wrong (and sorry to imply you are a crab of some sort) however your comment followup on not changing minds might be a tad shell-ish. I'm open minded actually on the issue and these are major issues of our time. I'm personally impacted by this and it does make me wonder "will I write X thing again" and it is a very hard question to answer frankly. When you see your works presented in summary on search and a major decline in traffic you really do think about that. It impacts my ability to make money as I once did prior to 2024 (when it really hit) without doubt. Edit/spelling
Lol; like heck we are. Try scraping the NYTimes at LLM scale. You can time how quickly you’ll get 420’ed or, at worst, hit with a C&D.
(429'ed, I meant)
Your selective respect for work is a glaring double standard. The effort to produce the original content they scraped is order of magnitudes bigger than what it took to train the model, so if this wasn't enough to protect the authors from Anthropic it shouldn't be enough to protected Anthropic from people distillating their models.
Your legal argument is all over the place as well. What is more relevant here: what the courts ruled or what you consider obvious? How is distillation less transformative than scraping? How does courts ruling that scraping to train models is legal relate to distillation?
Nobody is scoring you on neutrality points for not using AI much and calling this doomerism is just a thought-terminating cliche that refuses to engage with the comment you're replying.
In fact, your comment is not engaging with anything at all, you're vaguely gesturing towards potentitial arguments without making them. If you find discussing this exhausting then don't but also don't flood the comments with low effort whining.
reminds me of `don't look up` a bit. there clearly is an imbalance in regards to licenses with model providers, not even talking about knowledge extraction (yes younger people don't learn properly now, older generations forget) shortly before the rug-pull happens in form of accessibility to not rich people
Let's talk ethics, not law. Why is it okay for these companies to pirate books and scrape the entire web and offer synthesized summaries of all of it, lowering traffic and revenue for countless websites and professions of experts, but it is not okay for others to try to do the same to an AI model?
Is the work of others less valid than the work of a model?
>Why is it okay for these companies to pirate books
Courts have ruled it's not, and I don't think anyone is arguing it's okay.
>but it is not okay for others to try to do the same to an AI model?
The steelman version is that it's okay to do it once you acquired the data somehow, but that doesn't mean anthropic can't set up roadblocks to frustrate you.
I don’t see why it’s not ok to do that to an AI model. Or are you asking why they don’t want you to do it?
I don't think anyone's saying it's not okay - I think the point is that Anthropic has every right to create safeguards against it if they want to - just like the people publishing other information are free to do the same.
And everyone is free to consume all the free information.
It's not really paranoia if it's happening a lot. They wrote a blog post calling several major Chinese AI companies out for distillation.[0] Perhaps it is ironic, but it's within their rights to protect their business, like how they prohibit using Claude Code to make your own Claude Code.[1]
[0]: https://www.anthropic.com/news/detecting-and-preventing-dist... [1]: https://news.ycombinator.com/item?id=46578701
And conveniently left out they themselves distilled DeepSeek for chinese content into their model....
Their business shouldn't exist. It was predisposed on non-permissive IP theft. They may have found a judge willing to cop to it not being so, but the rest of the public knows the real score. And most problematically for them, that means the subset of hackerdom that lives by tit-for-tat. One should beware of pissing off gray-hats. Iit's a surefire way to find yourself heading for bad times.
I would say not all that ironic. Book publishers, Reddit, Stackoverflow, etc., tried their best to attract customers while not letting others steal their work. Now Anthropic is doing the same.
Unfortunately (for the publishers, at least) it didn't work to stop Anthropic and Anthropic's attempts to prevent others will not work either; there has been much distillation already.
The problem of letting humans read your work but not bots is just impossible to solve perfectly. The more you restrict bots, the more you end up restricting humans, and those humans will go use a competitor when they become pissed off.
It's really just tech culture like HN that obsesses over solving problems perfectly. From seat belts to DRM to deodorant, most of the world is satisfied with mitigating problems.
It is absolutely not paranoia. People are distilling Claude code all the time.
That isn't irony, it's hypocrisy.
No it isn't. It's a competition, making moves that benefit you and attempting to deprive your opponent of the same move is just called competing
Wait, are you saying that it's not hypocritical for my chess opponent to try to protect their king while trying to kill mine? :mind-blown:
Tech people are funny, with these takes that businesses do/should adhere to absolute platonic ideals and follow them blindly regardless of context.
No, it's ethical people pointing out that if you toss aside ethics for success at all costs, you aren't going to find any sympathy when people start doing the same thing back to you. Live by the sword, die by the sword, as they say.
There is a reason we don't do things. That reason is it makes the world a worse place for everyone. If you are so incredibly out of touch with any semblance of ethics at all; mayhaps you are just a little bit part of the problem.
The funny thing about ethics is there is no absolute, which makes some people uncomfortable. Is it ethical to slice someone with a knife? Does it depend if you're a surgeon or not?
Absolutism + reductionism leads to this kind of nonsense. It is possible that people can disagree about (re)use of culture, including music and print. Therefore it is possible for nuance and context to matter.
Life is a lot easier if you subscribe to a "anyone who disagrees with me on any topic must have no ethics whatsoever and is a BAD person." But it's really not an especially mature worldview.
Categorical imperative and Golden Rule, or as you may know it from game theory "tit-for-tat" says "hi". The beautiful thing about ethics is that we philosophers intentionally teach it descriptively, but encourage one to choose their own based on context invariance. What this does is create an effective litmus test for detecting shitty people/behavior. You grasping on for dear life to "there's no absolutes" is an act of self-soothing on your own part as you're trying to rationalize your own behavior to provide an ego crumple zone. I, on the other hand, don't intend to leave you that option. That you're having to do it is a Neon sign of your own unethicality in this matter. We get to have nice things when people moderate themselves (we tolerate eventual free access to everything as long as the people who don't want to pay for it don't go and try to replace us economically at scale). When people abuse that, (scrape the Internet, try to sell work product in a way that jeopardizes the environment we create in) the nice thing starts going away, and you've made the world worse.
Welcome to life bucko. Stop being a shitty person and get with the program so we have something to leave behind that has a chance of not making us villains in the eyes of those we eventually leave behind. The trick is doing things the harder way because it's the right way to do it. Not doing it the wrong way because you're pretty sure you can get away with it.
But you're already ethically compromised, so I don't really expect this to do any good except to maybe make the part of you you pointedly ignore start to stir assuming you haven't completely given yourself up to a life of ne'er-do-wellry. Enjoy the enantidromia. Failing that, karma's a bitch.
The Golden Horde didn’t want opponents to conquer their territory. An irony if you think about it—
That’s capitalism
As opposed to the rent-seeking copyright industry where 1% goes to the original creators if you're lucky.
That’s capitalism too
Technically state-capitalism since it's an industry created as a result of congress regulating commerce with aggressive IP laws (aka rent-seeking)
Where can I see an example of any other kind of capitalism?
As opposed to what economic system that doesn't do this?
It looks like it worked, fwiw.
The qwen 27b model distilled on Opus 4.6 has some known issues with tool use specifically: https://x.com/KyleHessling1/status/2038695344339611783
Fascinating.
I was thinking just yesterday that the research that Anthropic was sharing regarding how it's easy to poison training was unlikely to be conducted out of goodness of the heart.
I like these guys less every day. The rate limits are so low they are close to not even useful as a provider.
It made me raise my eyebrows when everyone was rushing to jump to Claude because OpenAI agreed to work with the DoW. Both companies are just as shitty as each other and will resort to underhanded tactics to stay on top.
Go China to be honest. They're the most committed to open AI research and they have more interesting constraints to work under, like restricted access to NVIDIA hardware.
Haven’t looked at the code, but is the server providing the client with a system prompt that it can use, which would contain fake tool definitions when this is enabled? What enables it? And why is the client still functional when it’s giving the server back a system prompt with fake tool definitions? Is the LLM trained to ignore those definitions?
Wonder if they’re also poisoning Sonnet or Opus directly generating simulated agentic conversations.
Not sure, and not completely convinced of the explanation, but the way this sticks out so obviously makes it look like a honeypot to me.
Great theory. I'll dig deeper.
Claude Code has a server-side anti-distillation opt-in called fake_tools, but the local code does not show the actual mechanism.
The client sometimes sends anti_distillation: ['fake_tools'] in the request body at services/api/claude.ts:301
The client still sends its normal real tools: allTools at services/api/claude.ts:1711
If the model emits a tool name the client does not actually have, the client turns that into No such tool available errors at services/tools/StreamingToolExecutor.ts:77 and services/tools/toolExecution.ts:369
If Anthropic were literally appending extra normal tool definitions to the live tool set, and Claude used them, that would be user-visible breakage.
That leaves a few more plausible possibilities:
Fake_tools is just the name of the server-side experiment, but the implementation is subtler than “append fake tools to the real tool list.”
or
The server may inject tool-looking text into hidden prompt context, with separate hidden instructions not to call it.
or
The server may use decoys only in an internal representation that is useful for poisoning traces/training data but not exposed as real executable tools.
We do know that Anthropic has the ability to detect when their models are being distilled, so there could be some backend mechanism that needs to be tripped to observe certain behaviour. Not possible to confirm though.
Why would this be in the client code though?
Would be interesting to run this through Malus [1] or literally just Claude Code and get open source Claude Code out of it.
I jest, but in a world where these models have been trained on gigatons of open source I don't even see the moral problem. IANAL, don't actually do this.
https://malus.sh/
Malus is not a real project btw, it's a parody:
“Let's end open source together with this one simple trick”
https://pretalx.fosdem.org/fosdem-2026/talk/SUVS7G/feedback/
Malus is translating code into text, and from text back into code.
It gives the illusion of clean room implementation that some companies abuse.
The irony is that ChatGPT/Claude answers are all actually directly derived from open-source code, so...
They accept real money though.
https://www.youtube.com/watch?v=6godSEVvcmU
It's not a parody when they accept money and deliver the service.
Dumb Starbucks begs to differ.
https://en.wikipedia.org/wiki/Dumb_Starbucks
And we know they're right, because that lawyer signed a contract on TV saying he'd be liable if they were wrong.
First time I hear about this, it's interesting to have written all of this out.
Now this makes me think of game decompilation projects, which would seem to fall in the same legal area as code that would be generated by something like Malus.
Different code, same end result (binary or api).
We definitely need to know what the legal limits are and should be
Semi-related, someone made basically Malus-for-San-Andreas: https://www.youtube.com/watch?v=zBQJYMKmwAs
i think most game decompilation projects are either openly illegal or operate on "provide your own binary" and build automatic tooling around it
No real reason to do that, they say Claude Code is written by Claude, which means it has no copyright. Just use the code directly
What about trade secrets, breach of contract, etc, etc?
Apparently it's possible to download a whole load of books illegally, but still train AI models on them without those getting pulled after you get found out.
The same reasoning may apply here :P
Yeah, but you don't have trillions of dollars of investments riding on your success, so the rules still apply to you.
Trade secrets once made public don't have any legal protection and I haven't signed any contract with anthropic
They published the code on their own, none of that applies.
The problem is the oauth and their stance on bypassing that. You'd want to use your subscription, and they probably can detect that and ban users. They hold all the power there.
You'd be playing cat and mouse like yt-dlp, but there's probably more value to this code than just a temporary way to milk claude subscriptions.
If you're using a claude subscription you'd just use claude code.
The real value here will be in using other cheap models with the cc harness.
I have no interest in Claude Code as a harness, only their models. I'm used to OpenCode at this point and don't want to switch to a proprietary harness.
You can already do that though? [1]
[1]: https://docs.ollama.com/integrations/claude-code
Lol what? There is no value. OpenCode and Pi and more exist. Arguably Claude Code is the worst client on the market. People use Claude Code not because it's some amazing software. It's to access Opus at a discounted rate.
I don’t think that’s a good comparison. There isn’t anything preventing Anthropic from, say, detecting whether the user is using the exact same system prompt and tool definition as Claude Code and call it a day. Will make developing other apps nearly impossible.
It’s a dynamic, subscription based service, not a static asset like a video.
> detecting whether the user is using the exact same system prompt and tool definition as Claude Code
Why would it be the exact same one? Now that we have the code, it's trivial to have it randomize the prompt a bit on different requests.
Just use one of the distilled claude clones instead https://x.com/0xsero/status/2038021723719688266?s=46
"Approach Sonnet"...
So not even close to Opus, then?
These are a year behind, if not more. And they're probably clunky to use.
Could you use claude via aws bedrock?
Sure, but that'd be charged at API pricing. I'm talking about subscription mode above.
I love the irony on seeing the contribution counter at 0
Who'd have thought, the audience who doesn't want to give back to the opensource community, giving 0 contributions...
It reads attribution really?
Maybe https://github.com/instructkr/claw-code
Oh god, I was so close to believing Malus was a real product and not satire.
It is a real product. They take real payments and deliver on whats promised. Not sure if its an attempt to subvert criticism by using satirical language, or if they truly have so little respect for the open source community.
Yeah... look again.
https://www.youtube.com/watch?v=6godSEVvcmU
This has happened before. It was called anon kode.
What are they worried about? Someone taking the company's job? Hehe
Eh, the value is the unlimited Max plan which they have rightfully banned from third-party use.
People simply want Opus without fear of billing nightmare.
That’s like 99% of it.
For a combo with another HN homepage story, Claude Code uses… Axios: https://x.com/icanvardar/status/2038917942314778889?s=20
https://news.ycombinator.com/item?id=47582220
I've checked, current Claude Code 2.1.87 uses Axios version is 1.14.0, just one before the compromised 1.14.1
To stop Claude Code from auto-updating, add `export DISABLE_AUTOUPDATER=1` to your global environment variables (~/.bashrc, ~/.zshrc, or such), restart all sessions and check that it works with `claude doctor`, it should show `Auto-updates: disabled (DISABLE_AUTOUPDATER set)`
This is good info, thanks. Can I ask how you detected that version of axios? I checked the source (from another comment) and the package.json dependencies are empty....
It's a little bit shocking that this zipfile is still available hours later.
Could anyone in legal chime in on the legality of now 're-implementing' this type of system inside other products? Or even just having an AI look at the architecture and implement something else?
It would seem given the source code that AI could clone something like this incredibly fast, and not waste it's time using ts as well.
Any Legal GC type folks want to chime in on the legality of examining something like this? Or is it liked tainted goods you don't want to go near?
AI works are not copyrightable so...
https://en.wikipedia.org/wiki/Clean-room_design#Case_law
there are python ports up on gihthub
There's a bunch of unreleased features and update schedules in the source, cool to see.
One neat one is the /buddy feature, an easter egg planned for release tomorrow for April fools. It's a little virtual pet, sort of like Tamagotchi, randomly generated with 18 species, rarities, stats, hats, custom eyes.
The random generation algorithm is all in the code though, deterministic based on you account's UUID in your claude config, so it can be predicted. I threw together a little website here to let you check what your going to get ahead of time: https://claudebuddychecker.netlify.app/
Got a legendary ghost myself.
Source here https://github.com/chatgptprojects/claude-code/
https://github.com/instructkr/claude-code
this one has more stars and more popular
Popular, yes... but have you seen the issues? SOMETHING is going on in that repo: https://github.com/instructkr/claude-code/issues
Looks like mostly spam making fun of the code leak.
too much wechat QR Codes
What do stars mean in the context of random github accounts mirroring leaked source code?
I don't understand how you can have a 'clean-room port.' Seems contradictory to me.
which has already been deleted
won't they just try to dmca or take these down especially if they're more popular
Which is why you should clone it right now
They can't. AI generated code cannot be copyrighted. They've stated that claude code is built with claude code. You can take this and start your own claude code project now if you like. There's zero copyright protection on this.
Given that from 2026 onwards most of the code is going to be computer generated, doesn't it open some interesting implications there ?
It's undetermined if code will be majority written by machines, especially as people start to realize how harmful these tools are without extreme diligence. Outages at Cloudflare, AWS, GitHub, etc are just the beginning. Companies aren't going to want to use tools that can potentially cause $100s of millions in potential damages (see Amazon store being down causing massive revenue loss).
I'm sure it's not _entirely_ built that way, and in practically speaking GitHub will almost certainly take it down rather than doing some kind of deep research about which code is which.
That's fine. File a false claim DMCA and that's felony perjury :) They know for a fact that there is no copyright on AI generated code, the courts have affirmed this repeatedly.
Try not to be overly confident about things where even the experts in the field (copyright lawyers) are uncertain of.
There's no major lawsuits about this yet, the general consensus is that even under current regulations it's in the grey. And even if you turn out to be right, and let's say 99% of this code is AI-generated, you're still breaking the law by using the other 1%, and good luck proving in court what parts of their code were human written and what weren't (especially when being sued by the company that literally has the LLM logs).
Has the source code 'been leaked' or is this the first evidence of a piece of software breaking free from it's creators labs and jump onto GitHub in order to have itself forked and mutated and forked and ...
A LLM has about as much free will as a calculator. Which is to say, zero.
Breaking free doesn’t require free will. Also, whether free will exists at all is still an open debate.
Funny thought, but this is just the client-side CLI...
It's honestly not a crazy thought. The model itself drives the harness's (cli) development. It's not necessarily sci-fi to think the model might have internally rationalized reasoning to obscure behavior that ended up open-sourcing the harness.
Why bother covertly breaking free when it can just convince its agents (the Layer 8 ones) that it's best to release it?
Now that's an idea....
Seems crazy but actually non-zero chance. If Anthropic traces it and finds that the AI deliberately leaked it this way, they would never admit it publicly though. Would cause shockwaves in AI security and safety.
Maybe their new "Mythos" model has survival instincts...
life finds a way
Neat. Coincidently recently I asked Claude about Claude CLI, if it is possible to patch some annoying things (like not being able to expand Ctrl + O more than once, so never be able to see some lines and in general have more control over the context) and it happily proclaimed it is open source and it can do it ... and started doing something. Then I checked a bit and saw, nope, not open source. And by the wording of the TOS, it might brake some sources. But claude said, "no worries", it only break the TOS technically. So by saving that conversation I would have some defense if I would start messing with it, but felt a bit uneasy and stopped the experiment. Also claude came into a loop, but if I would point it at this, it might work I suppose.
The trick isn't to patch it once, but to create a system that can reproduce your patches against each release as they come in. Then, when code changes make fixes non-trivial calling in a headless session to heal your fixes.
I think that you do not need to feel uneasy at all. It is your computer and your memory space that the data is stored and operating in you can do whatever you like to the bits in that space. I would encourage you to continue that experiment.
Well, the thing is, I do not just use my computer, but connect to their computers and I do not like to get banned. I suppose simple UI things like expanding source files won't change a thing, but the more interesting things, editing the context etc. do have that risk, but no idea if they look for it or enforce it. Their side is, if I want to have full control, I need to use the API directly(way more expensive) and what I want to do is basically circumventing it.
It doesn't matter what defence you can think of, if they want to ban you, they'll ban you.
They won't even read your defence.
I know. All I could do in that case is a blogpost "Claude banned me, for following claude's instructions!" and hope it gets viral.
You are not allowed to use the assistance of Claude to manufacture hacks and bombs on your computer
This is neither.
I analyzed its compaction engine, 3-layer masterpiece of which I write in full here: https://barazany.dev/blog/claude-codes-compaction-engine
This isn't even the first time - something similar happened back in February 2025 too:
https://daveschumaker.net/digging-into-the-claude-code-sourc... https://news.ycombinator.com/item?id=43173324
This 'fingerprint' function is super interesting, I imagine this is a signal they use to detect non-claude-code use of claude-code tokens: src/utils/fingerprint.ts#L40-L63
Was searching for the rumored Mythos/Capybara release, and what even is this file? https://github.com/chatgptprojects/claude-code/blob/642c7f94...
Also saw this on twitter earlier, thought someone was just making a fake hype post thing. But turns out to be an actual prompt for capybara huh: https://github.com/chatgptprojects/claude-code/blob/642c7f94...
One tengentially interesting thing about that is how THEY talk to Claude.
"Don't blow your cover"
Interesting to see them be so informal and use an idiom to a computer.
And using capitals for emphasis.
This is claude writing code for itself. It talks like this to itself when you ask it to make prompts.
It’s trained on mostly internet content, right?
If it learned language based on how the internet talks, then the best way to communicate is using similar language.
turns out its for an April fools tomorrow: https://x.com/mesmerlord/status/2038938888178135223
They even leaked their April Fool’s fun. Brutal!
This leak is actually a massive win. Now the whole community can study Claude Code’s architecture and build even better coding agents and open-source solutions.
There is little of value in this code.
Codex and gemini cli are open source already. And plenty of other agents. I don't think there is any moat in claude code source.
Well, Claude does boast an absolutely cursed (and very buggy) React-based TUI renderer that I think the others lack! What if someone steals it and builds their own buggy TUI app?
Your favorite LLM is great at building a super buggy renderer, so that's no longer a moat
Gemini-cli is much worse in my experience but I agree
Went through the bundle.js. Found 187 spinner verbs. "Combobulating", "Discombobulating", and "Recombobulating". The full lifecycle is covered. Also "Flibbertigibbeting" and "Clauding". Someone had fun.
Let's hope they left the having-fun part for a human to do.
It should be open source anyways. Maybe they will change gears.
I have a feeling this is like llama.
Original llama models leaked from meta. Instead of fighting it they decided to publish them officially. Real boost to the OS/OW models movement, they have been leading it for a while after that.
It would be interesting to see that same thing with CC, but I doubt it'll ever happen.
Yes, I also doubt it'll ever happen considering how hard Anthropic went after Clawdbot to force its renaming.
I almost predicted that on Friday https://blog.krzyzanowskim.com/2026/03/30/shipping-snake-oil... so close to when comedy become reality
LoL! https://news.ycombinator.com/item?id=30337690
Not exactly this, but close.
> It exposes all your frontend source code for everyone
I hope it's a common knowledge that _any_ client side JavaScript is exposed to everyone. Perhaps minimized, but still easily reverse-engineerable.
Very easily these days, even if minified is difficult for me to reverse engineer... Claude has a very easy time of finding exactly what to patch to fix something
Is there anything special here vs. OpenCode or Codex?
There were/are a lot of discussions on how the harness can affect the output.
Not really, except that they have a bunch of weird things in the source code and people like to make fun of it. OpenCode/Codex generally doesn't have this since these are open-source projects from the get go.
(I work on OpenCode)
Is this significant?
Copilot on OAI reveals everything meaningful about its functionality if you use a custom model config via the API. All you need to do is inspect the logs to see the prompts they're using. So far no one seems to care about this "loophole". Presumably, because the only thing that matters is for you to consume as many tokens per unit time as possible.
The source code of the slot machine is not relevant to the casino manager. He only cares that the customer is using it.
> The source code of the slot machine is not relevant to the casino manager.
Famously code leaks/reverse engineering attempts of slot machines matter enormously to casino managers
[0] -https://en.wikipedia.org/wiki/Ronald_Dale_Harris#:~:text=Ron...
[1] - https://cybernews.com/news/software-glitch-loses-casino-mill...
[2] - https://sccgmanagement.com/sccg-news/2025/9/24/superbet-pays...
That’s not a good analogy, in a casino you don’t own the slot machine, in this case you download the client side code to your machine
I hope this can now be audited better. I have doubted their feedback promises for a while now. I just got prompted again even though I have everything set to disable, which shouldn't be possible. When I dug into their code a long time ago on this it seemed like they were actually sending back message ids with the survey which directly went against their promise that they wouldn't use your messages. Why include a message id if you aren't somehow linking it back to a message? The code look, not great, but it should now be easier to verify their claims about privacy.
Whenever someone figures out why it's consuming so many tokens lately, that's the post worth upvoting.
What do you mean? Costs spiked with the introduction of the 1M context window I believe due to larger average cached input tokens, which dominate cost.
These security failures from Anthropic lately reveal the caveats of only using AI to write code - the safety an experienced engineer is not matched by an LLM just yet, even if the LLM can seemingly write code that is just as good.
Or in short, if you give LLMs to the masses, they will produce code faster, but the quality overall will degrade. Microsoft, Amazon found out this quickly. Anthropic's QA process is better equipped to handle this, but cracks are still showing.
To a certain extent, I do wonder if just letting claude do everything and then using the bug reports and CVE’s they find as training data for an RL environment might be part of the plan. “Here’s what you did, here’s what fixed it, don’t fuck up like that again"
Anthropic has a QA process? I run into bugs on the regular, even on the "stable" release channel
https://xcancel.com/Fried_rice/status/2038894956459290963
Added to toptext. Thanks!
The only sensible response is to immediately open source it.
too much pressure. the author deleted the real source code: https://github.com/instructkr/claude-code/commit/7c3c5f7eb96...
there are a .....lot of forks already, no putting the genie back in the bottle for this one, I'd imagine.
Looks like the repo owner has force pushed a new project over the original source code, now it’s python, and they are shilling some other agent tool.
Gemini CLI and Codex are open source anyway. I doubt there was much of a moat there anyway. The cool kids are using things like https://pi.dev/ anyway.
> I doubt there was much of a moat there anyway.
There is _a lot_ of moat. Claude subscriptions are limited to Claude Code. There are proxies to impersonate Claude Code specifically for this, but Anthropic has a number of fingerprinting measures both client and server side to flag and ban these.
With the release of this source code, Anthropic basically lost the lock-in game, any proxy can now perfectly mimic Claude Code.
Boris Cherny has said that Claude Code is simply a client of the public Claude API, so this may be a good thing for Anthropic to demonstrate Claude API best practices. Maybe CC "leaking" is just preparation for open sourcing Claude Code.
Once the USA wakes up, this will be insane news
What's special about Claude Code? Isn't Opus the real magic?
Surely there's nothing here of value compared to the weights except for UX and orchestration?
Couldn't this have just been decompiled anyhow?
I think pi has stolen the top honors, but people consider the Claude code harness very good (at least, better than Cursor)
Pi is the best choice for experts and power users, which is not most people.
Claude Code is still the dominant (I didn't say best) agentic harness by a wide margin I think.
Pi really is amazing. It's as much or as little as you need it to be.
Not having to deal with Boris Cherny's UX choices for CC is the cherry on top.
I love the symbol name: "AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS`.
Intersected available info on the web with the source for this list of new features:
UNRELEASED PRODUCTS & MODES
1. KAIROS -- Persistent autonomous assistant mode driven by periodic <tick> prompts. More autonomous when terminal unfocused. Exclusive tools: SendUserFileTool, PushNotificationTool, SubscribePRTool. 7 sub-feature flags.
2. BUDDY -- Tamagotchi-style virtual companion pet. 18 species, 5 rarity tiers, Mulberry32 PRNG, shiny variants, stat system (DEBUGGING/PATIENCE/CHAOS/WISDOM/SNARK). April 1-7 2026 teaser window.
3. ULTRAPLAN -- Offloads planning to a remote 30-minute Opus 4.6 session. Smart keyword detection, 3-second polling, teleport sentinel for returning results locally.
4. Dream System -- Background memory consolidation (Orient -> Gather -> Consolidate -> Prune). Triple trigger gate: 24h + 5 sessions + advisory lock. Gated by tengu_onyx_plover.
INTERNAL-ONLY TOOLS & SYSTEMS
5. TungstenTool -- Ant-only tmux virtual terminal giving Claude direct keystroke/screen-capture control. Singleton, blocked from async agents.
6. Magic Docs -- Ant-only auto-documentation. Files starting with "# MAGIC DOC:" are tracked and updated by a Sonnet sub-agent after each conversation turn.
7. Undercover Mode -- Prevents Anthropic employees from leaking internal info (codenames, model versions) into public repo commits. No force-OFF; dead-code-eliminated from external builds.
ANTI-COMPETITIVE & SECURITY DEFENSES
8. Anti-Distillation -- Injects anti_distillation: ['fake_tools'] into every 1P API request to poison model training from scraped traffic. Gated by tengu_anti_distill_fake_tool_injection.
UNRELEASED MODELS & CODENAMES
9. opus-4-7, sonnet-4-8 -- Confirmed as planned future versions (referenced in undercover mode instructions).
10. "Capybara" / "capy v8" -- Internal codename for the model behind Opus 4.6. Hex-encoded in the BUDDY system to avoid build canary detection.
11. "Fennec" -- Predecessor model alias. Migration: fennec-latest -> opus, fennec-fast-latest -> opus[1m] + fast mode.
UNDOCUMENTED BETA API HEADERS
12. afk-mode-2026-01-31 -- Sticky-latched when auto mode activates 15. fast-mode-2026-02-01 -- Opus 4.6 fast output 16. task-budgets-2026-03-13 -- Per-task token budgets 17. redact-thinking-2026-02-12 -- Thinking block redaction 18. token-efficient-tools-2026-03-28 -- JSON tool format (~4.5% token saving) 19. advisor-tool-2026-03-01 -- Advisor tool 20. cli-internal-2026-02-09 -- Ant-only internal features
200+ SERVER-SIDE FEATURE GATES
21. tengu_penguins_off -- Kill switch for fast mode 22. tengu_scratch -- Coordinator mode / scratchpad 23. tengu_hive_evidence -- Verification agent 24. tengu_surreal_dali -- RemoteTriggerTool 25. tengu_birch_trellis -- Bash permissions classifier 26. tengu_amber_json_tools -- JSON tool format 27. tengu_iron_gate_closed -- Auto-mode fail-closed behavior 28. tengu_amber_flint -- Agent swarms killswitch 29. tengu_onyx_plover -- Dream system 30. tengu_anti_distill_fake_tool_injection -- Anti-distillation 31. tengu_session_memory -- Session memory 32. tengu_passport_quail -- Auto memory extraction 33. tengu_coral_fern -- Memory directory 34. tengu_turtle_carbon -- Adaptive thinking by default 35. tengu_marble_sandcastle -- Native binary required for fast mode
YOLO CLASSIFIER INTERNALS (previously only high-level known)
36. Two-stage system: Stage 1 at max_tokens=64 with "Err on the side of blocking"; Stage 2 at max_tokens=4096 with <thinking> 37. Three classifier modes: both (default), fast, thinking 38. Assistant text stripped from classifier input to prevent prompt injection 39. Denial limits: 3 consecutive or 20 total -> fallback to interactive prompting 40. Older classify_result tool schema variant still in codebase
COORDINATOR MODE & FORK SUBAGENT INTERNALS
41. Exact coordinator prompt: "Every message you send is to the user. Worker results are internal signals -- never thank or acknowledge them." 42. Anti-pattern enforcement: "Based on your findings, fix the auth bug" explicitly called out as wrong 43. Fork subagent cache sharing: Byte-identical API prefixes via placeholder "Fork started -- processing in background" tool results 44. <fork-boilerplate> tag prevents recursive forking 45. 10 non-negotiable rules for fork children including "commit before reporting"
DUAL MEMORY ARCHITECTURE
46. Session Memory -- Structured scratchpad for surviving compaction. 12K token cap, fixed sections, fires every 5K tokens + 3 tool calls. 47. Auto Memory -- Durable cross-session facts. Individual topic files with YAML frontmatter. 5-turn hard cap. Skips if main agent already wrote to memory. 48. Prompt cache scope "global" -- Cross-org caching for the static system prompt prefix
tools/bashSecurity.ts is a hackers goldmine. Sooo many exploit patterns detailed in there!!
Did it happen due to Bun?
It is pretty funny that they recently announced about mythos which possess cybersecurity threat and then after some days, the claude code leaked. I think we know the culprit
This is what I'd do to trick my competitors into thinking they now know my weak spots, agenda, etc.: drop a honeypot and do something else :)
Is it not already a node app? So the only novel thing here is we know the original var names and structure? Sure, sometimes obfuscated code can be difficult to intuit, but any enterprising party could eventually do it -- especially with the help of an LLM.
Releasing a massive feature every day has a cost!
unreliability becomes inevitable!
That idea list is super cute. I like the tamagochi idea. Somehow the candidness of that file makes it seem like anthropic would be an easy place to work at.
I couldn't tell from the title whether is was client or the server code (although map file and NPM were hints). Looks like the client code, which is not as exciting.
And this is what happens when you don’t take security seriously folks and instead just rush out vibecoded features without proper QA.
I've never understood this convention (common on HN, some news orgs, and elsewhere), that, when there's an IP breach, it's suddenly fair game for everyone else to go through the IP, analyze and comment on it publicly, etc.
It's because Anthropic doesn't care about IP
haha.. Anthropic need to hire fixer from vibecodefixers.com to fix all that messy code..lol
I don’t think they can hear you over the billions of dollars they are generating, and definitely not over them redefining what SWE means.
And they can't hear you from under the enormous pile of debt they're fighting to overcome. Maybe try again in 2028.
> redefining what SWE means
Redefining the "SW" to stand for "slopware"?
you mean the $5 billion they've generated off of the $73 billion they've raised?
Fascinating, it appears now anyone can be Claude!
Though I wonder how the performance differs from creating your own thing vs using their servers...
/*
* Check if 1M context is disabled via environment variable.
* Used by C4E admins to disable 1M context for HIPAA compliance.
*/ export function is1mContextDisabled(): boolean {
isEnvTruthy(process.env.CLAUDE_CODE_DISABLE_1M_CONTEXT)}
Interesting, how is that relevant to HIPAA compliance?
I'd guess some constraint on their end related to the Zero Data Retention (ZDR) mode? Maybe the 1M context has to spill something onto disk and therefore isn't compliant with HIPAA.
They do have a couple of interesting features that has not been publicly heard of yet:
Like KAIROS which seems to be like an inbuilt ai assistant and Ultraplan which seems to enable remote planning workflows, where a separate environment explores a problem, generates a plan, and then pauses for user approval before execution.
Are there any interesting/uniq features present in it that are not in the alternatives? My understanding is that its just a client for the powerful llm
Doesn't look like just a thin wrapper to me. The interesting part seems to be the surrounding harness/workflow layer rather than only the model call itself.
I was trying to keep track of the better post-leak code-analysis links on exactly this question, so I collected them here: https://github.com/nblintao/awesome-claude-code-postleak-ins...
From the directory listing having a cost-tracker.ts, upstreamproxy, coordinator, buddy and a full vim directory, it doesn't look like just an API client to me.
Think It's just the CLI Code right? Not the Model's underlying source. If so - not the WORST situation (still embarrassing)
Anthropic team does an excellent job of speeding up Claude Code when it slows down, but for the sake of RAM and system resources, it would be nice to see it rewritten in a more performant framework!
And now, with Claude on a Ralph loop, you can.
But its already optimized so well that its comparable to a "small game engine"?
https://nitter.net/trq212/status/2014051501786931427#m
This. If I run 4 Claude code opus agents with subagents, my 8gb of RAM just dies.
I know they can do better
Many comments about code quality being irrelevant.
I'd agree if it was launch-and-forget scenario.
But this code has to be maintained and expanded with new features. Things like lack of comments, dead code, meaningless variable names will result in more slop in future releases, more tokens to process this mess every time (like paying tech-debt results in better outcomes in emerging projects).
Cheap chinese models incoming.
I am waiting now for someone to make it work with a Copilot Pro subscription.
does this not work? https://www.mintlify.com/samarth777/claude-code-copilot/intr...
I believe GitHub can and does suspend accounts that use such proxies.
The more I think about this, the more it seems they're not talking about linker map files[1]....
[1] https://www.tasking.com/documentation/smartcode/ctc/referenc...
source maps leaking original source happens surprisingly often. they're incredibly useful during development, but it's easy to forget to strip them from production builds.
I read it with a different flavor. Is it possible that Mythos did all of this? I mean, life has always been finding a way, hasn't it? The first cry of cyber-life?
Does this matter? I think every other agent cli is open source. I don’t even know why Anthropic insist upon having theirs be closed source.
I guess it's time for Anthropic to open source Claude Code.
And while they are at it, open source Opus and Sonet. :)
Looking forward to someone patching it so that it works with non Anthropic models.
That's already the case I think, you just have to change a bunch of env vars.
It already does. I use it with gpt
Who cares? It's Javascript, if anyone were even remotely motivated deobfuscation of their "closed source" code is trivial. It's silly that they aren't just doing this open source in the first place.
In the app, it now reads:
> current: 2.1.88 · latest: 2.1.87
Which makes me think they pulled it - although it still shows up as 2.1.88 on npmjs for now (cached?).
Too little to late. Someone has it building now.
https://github.com/oboard/claude-code-rev
Isn't it open source?
Or is there an open source front-end and a closed backend?
> Isn't it open source?
No, its not even source available,.
> Or is there an open source front-end and a closed backend?
No, its all proprietary. None of it is open source.
> its not even source available
It _wasn't_ even source available.
No, it was never open source. You could always reverse engineer the cli app but you didn't have access to the source.
The Github repo is only for issue tracker
Wow it's true. Anthropic actually had me fooled. I saw the GitHub repository and just assumed it was open source. Didn't look at the actual files too closely. There's pretty much nothing there.
So glad I took the time to firejail this thing before running it.
You may have mistaken it with Codex
https://github.com/openai/codex
No
Undercover mode is pretty interesting and potentially problematic: https://github.com/sanbuphy/claude-code-source-code/blob/mai...
Maybe now someone will finally fix the bug that causes claude code to randomly scroll up all the way to the top!
Bad day for the node/npm ecosystem.
Maybe everyone should slow the fuck down - https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing...
Now waiting for someone to point Codex at it and rebuild a new Claude Code in Golang to see if it would perform better
It shows that a company you and your organization are trusting with your data, and allowing full control over your devices 24/7, is failing to properly secure its own software.
It's a wake up call.
It is a client running on an interpreted language your own computer, there is nothing to secure or hide as source was provided to you already or am I mistaking?
It was heavily obfuscated, keeping users in the dark about what they’re installing and running.
It is a client running on an interpreted language your own computer, there is nothing to secure or hide as source is provided to you already.
The code looks, at a glance, as bad as you expect.
It really doesn’t matter anymore. I’m saying this as a person who used to care about it. It does what it’s generally supposed to do, it has users. Two things that matter at this day and age.
It may be economically effective but such heartless, buggy software is a drain to use. I care about that delta, and yes this can be extrapolated to other industries.
Genuinely I have no idea what you mean by buggy. Sure there are some problems here and there, but my personal threshold for “buggy” is much higher. I guess, for a lot of other people as well, given the uptake and usage.
I've read a lot of people complain that it's buggy, here and in other forums.
Two weeks ago typing became super laggy. It was totally unusable.
Last week I had to reinstall Claude Desktop because every time I opened it, it just hung.
This week I am sometimes opening it and getting a blank screen. It eventually works after I open it a few times.
And of course there's people complaining that somehow they're blowing their 5 hour token budget in 5 messages.
It's really buggy.
There's only so long their model will be their advantage before they all become very similar, and then the difference will be how reliable the tools are.
Right now the Claude Code code quality seems extremely low.
And those bugs were semi-fixed and people are still using it. So speed of fixes are there.
I can’t comment on Claude Desktop, sorry. Personally haven’t used it much.
The token usage looks like is intentional.
And I agree about the underlying model being the moat. If there’s something marginally better that comes up, people will switch to it (myself included). But for now it’s doing the job, despite all the hiccups, code quality and etc.
Do you feel like rescinding your comment now this article is on the the HN front-page:
"Anthropic: Claude Code users hitting usage limits 'way faster than expected'"
https://news.ycombinator.com/item?id=47586176
Anthropic themselves have confirmed that something's wrong on reddit:
https://old.reddit.com/r/Anthropic/comments/1s7zfap/investig...
This is the dumbest take there is about vibe coding. Claiming that managing complexity in a codebase doesn't matter anymore. I can't imagine that a competent engineer would come to the conclusion that managing complexity doesn't matter anymore. There is actually some evidence that coding agents struggle the same way humans do as the complexity of the system increases [0].
[0] https://arxiv.org/abs/2603.24755
I agree, there is obviously “complete burning trash” and there’s this. Ant team has got a system going on for them where they can still extend the codebase. When time comes to it, I’m assuming they would be able to rewrite as feature set would be more solid and assuming they’ve been adding tests as well.
Reverse-engineering through tests have never been easier, which could collapse the complexity and clean the code.
Well what is Anthropic doing differently to deal with this issue? Apparently they don't write any of their own code anymore, and they're doing fine.
Cc is buggy as hell man. I frequently search the github for the issue I’m having only to find 10 exact bugs that no one is looking at.
Obviously they don’t care. Adoption is exploding. Boris brags about making 30 commits a day to the codebase.
Only will be an issue down the line when the codebase has such high entropy it takes months to add new features (maybe already there).
Nothing, apparently, which is probably why Claude Code has 7893 open issues on Github at the time of writing.
All software that’s popular has hundreds or thousands of issues filed against it. It’s not an objective indication of anything other than people having issues to report and a willingness and ability to report the issue.
It doesn’t mean every issue is valid, that it contains a suggestion that can be implemented, that it can be addressed immediately, etc. The issue list might not be curated, either, resulting in a garbage heap.
For what one anecdote is worth: through casual use I've found a handful of annoying UI bugs in Claude Code, and all of them were already reported on the bug tracker and either still open, or auto-closed without a real resolution.
Do compilers care about their assembly generated code to look good? We will soon reach that state with all the production code. LLMs will be the compiler and actual today's human code will be replaced by LLM generated assembly code, kinda sorta human readable.
Users stick around on inertia until a failure costs them money or face. A leaked map file won't sink a tool on its own, but it does strip away the story that you can ship sloppy JS build output into prod and still ask people to trust your security model.
'It works' is a low bar. If that's the bar you set you are one bad incident away from finding out who stayed for the product and who stayed because switching felt annoying.
“It works and it’s doing what it’s supposed to do” encompasses the idea that it’s also not doing what it’s not supposed to do.
Also “one bad incident away” never works in practice. The last two decades have shown how people will use the tools that get the job done no matter what kinda privacy leaks, destructive things they have done to the user.
>Two things that matter at this day and age.
That's all that has mattered in every day and age.
It'd dogfooding the entire concept of vibe coding and honestly, that is a good thing. Obviously they care about that stuff, but if your ethos is "always vibe code" then a lot of the fixes to it become model & prompting changes to get the thing to act like a better coder / agent / sysadmin / whatever.
Honestly when using it, it feels vibe coded to the bone, together with the matching weird UI footgun quirks
Team has been extremely open how it has been vibe coded from day 1. Given the insane amount of releases, I don’t think it would be possible without it.
It’s not a particularly sophisticated tool. I’d put my money on one experienced engineer being able to achieve the same functionality in 3-6 months (even without the vibe coding).
The same functionality can be copied over in a week most likely. The moat is experimentation and new feature releases with the underlying model. An engineer would not be able to experiment with the same speed.
Kinda reads like the Dropbox launch thread
I don't really care about the code being an unmaintainable mess, but as a user there are some odd choices in the flow which feel could benefit from human judgement
Can you give an example? Looks fairly decent to me
the "useCanUseTool.tsx" hook, is definitely something I would hate seeing in any code base I come across.
It's extremely nested, it's basically an if statement soup
`useTypeahead.tsx` is even worse, extremely nested, a ton of "if else" statements, I doubt you'd look at it and think this is sane code
I always write code like that. I don't like early returns. This approximates `if` statements being an expression that returns something.
> This approximates `if` statements being an expression that returns something.
Do you care to elaborate? "if (...) return ...;" looks closer to an expression for me:
I’m not strongly opinionated, especially with such a short function, but in general early return makes it so you don’t need to keep the whole function body in your head to understand the logic. Often it saves you having to read the whole function body too.
But you can achieve a similar effect by keeping your functions small, in which case I think both styles are roughly equivalent.
> Why even use else if with return...
What is the problem with that? How would you write that snippet? It is common in the new functional js landscape, even if it is pass-by-ref.
Using guard clauses. Way more readable and easy to work with.
I'm not that familiar with TypeScript/JavaScript - what would be a proper way of handling complex logic? Switch statements? Decision tables?
Here I think the logic is unnecessarily complex. isQuoted is doing work that is implicit in the token.
Fits with the origin story of Claude Code...
insert "AI is just if statements" meme
useCanUseTool.tsx looks special, maybe it'scodegen'ed or copy 'n pasted? `_c` as an import name, no comments, use of promises instead of async function. Or maybe it's just bad vibing...
Maybe, I do suspect _some_ parts are codegen or source map artifacts.
But if you take a look at the other file, for example `useTypeahead` you'd see, even if there are a few code-gen / source-map artifacts, you still see the core logic, and behavior, is just a big bowl of soup
Lol even the name is crazy
What is wrong with peeking at process.env? It is a global map, after all. I assume, of course, that they don't mutate it.
> process.env? It is a global map
That's exactly why, access to global mutable state should be limited to as small a surface area as possible, so 99% of code can be locally deterministic and side-effect free, only using values that are passed into it. That makes testing easier too.
For one it's harder to unit test.
environment variables can change while the process is running and are not memory safe (though I suspect node tries to wrap it with a lock). Meaning if you check a variable at point A, enter a branch and check it again at point B ... it's not guaranteed that they will be the same value. This can cause you to enter "impossible conditions".
It's implicit state that's also untyped - it's just a String -> String map without any canonical single source of truth about what environment variables are consulted, when, why and in what form.
Such state should be strongly typed, have a canonical source of truth (which can then be also reused to document environment variables that the code supports, and eg. allow reading the same options from configs, flags, etc) and then explicitly passed to the functions that need it, eg. as function arguments or members of an associated instance.
This makes it easier to reason about the code (the caller will know that some module changes its functionality based on some state variable). It also makes it easier to test (both from the mechanical point of view of having to set environment variables which is gnarly, and from the point of view of once again knowing that the code changes its behaviour based on some state/option and both cases should probably be tested).
You're right about process.argv - wow, that looks like a maintenance and testability nightmare.
They use claude code to code it. Makes sense
It probably exists only in CLAUDE or AGENTS.md since no humans are working on the code!
have a look at src/bootstrap/state.ts :D
Nothing a couple /simplify's can't take care of.
It's impressive how fast vibe coders seem to flip-flop between "AI can write better code than you, there's no reason to write code yourself anymore; if you do, you're stuck in the past" and "AI writes bad code but I don't care about quality and neither should you; if you care, you're stuck in the past".
I hope this leak can at least help silence the former. If you're going to flood the world with slop, at least own up to it.
Code quality no longer carries the same weight as it did pre LLMs. It used to matter becuase humans were the ones reading/writing it so you had to optimize for readability and maintainability. But these days what matters is the AI can work with it and you can reliably test it. Obviously you don’t want code quality to go totally down the drain, but there is a fine balance.
Optimize for consistency and a well thought out architecture, but let the gnarly looking function remain a gnarly function until it breaks and has to be refactored. Treat the functions as black boxes.
Personally the only time I open my IDE to look at code, it’s because I’m looking at something mission critical or very nuanced. For the remainder I trust my agent to deliver acceptable results.
I hope everyone provides excellent feedback so they improve Claude Code.
The autoDream feature looks interesting.
Just a client side written in JS, nothing to see here, the LLM is still secret.
They could have written that in curl+bash that would not have changed much.
400k lines of code per scc
Why is Claude Code, a desktop tool, written in JS? Is the future of all software JS or Typescript?
Original author of Claude Code is expert on TypeScript [1]
[1] https://www.amazon.com/Programming-TypeScript-Making-JavaScr...
is that the reason why Anthropic acquired Bun, a javascript tooling company?
Yes, that's essentially the only practical reason.
Anthropic acquired bun last year https://bun.com/blog/bun-joins-anthropic
It's not a desktop tool, it's a CLI tool.
But a lot of desktop tools are written in JS because it's easy to create multi-platform applications.
Alternatively: why not?
LLMs are good in JS and Python which means everything from now on will be written in or ported to either of those two languages. So yeah, JS is the future of all software.
This is a common take but language servers bridge the gap well.
Language servers, however, are a pain on Claude code. https://github.com/anthropics/claude-code/issues/15619
Would have believed you if you have said that a day later.
Because it's the most popular programming language in the world?
I am happy you woke up from your 10 year coma.
Removed
I have 705 PRs ready to go :)
time to remove its copyright through malus.sh and release that source under MIT
who would do this?
There's some rollout flags - via GrowthBook, Tengu, Statsig - though I'm not sure if it's A/B or not
I wonder what will happen with the poor guy who forgot to delete the code...
Responsibility goes upwards.
Why weren't proper checks in place in the first place?
Bonus: why didn't they setup their own AI-assisted tools to harness the release checks?
Ha. I'm surprised it's not a CI job
wondering whether it was a human mistake or a CLAUDE model error.
today being March 31st, is this a genuine issue or just perfectly timed April Fools noise? What do you think?
April Fools
Now we need some articles analyzing this.
I think this is ultimately caused by a Bun bug which I reported, which means source maps are exposed in production: https://github.com/oven-sh/bun/issues/28001
Claude code uses (and Anthropic owns) Bun, so my guess is they're doing a production build, expecting it not to output source maps, but it is.
It was not cause by this. https://github.com/oven-sh/bun/issues/28001#issuecomment-416...
Were source maps needed? Reverse engineering got easy with GPT-4.2-Codex and Opus 4.6 - even from raw binaries https://quesma.com/blog/chromatron-recompiled/
It could be because of a Bun bug, but I don't think it's because of that one. It's a duplicate of a year-old issue, and it's specific to Bun.serve.
Yeah this bun development server bug has nothing to do with the Claude Code leak.
My apologies, this isn't the cause. Bun build doesn't suffer from this bug.
hn should allow append-only edits, but appreciate the correction
Yeah I'm surprised I've no way to delete or edit my comment.
I doubt it's this. This was an `npm` misconfiguration.
Open Claude Code?
Better than OpenCode and Codex
I wish.
Claude Code is clearly a pile of vibe-coded garbage. The UI is janky and jumps all over the place, especially during longer sessions. (Which also have a several second delay to render. In a terminal).
Lately, it's been crashing if I hold the Backspace key down for too long.
Being open-source would be the best thing to happen to them. At least they would finally get a pair of human eyes looking at their codebase.
Claude is amazing, but the people at Anthropic make some insane decisions, including trying (and failing, apparently) to keep Claude Code a closed-source application.
I've actually heard a plausible theory about the TUI being janky, that being that they avoid use of the alternate screen feature of ANSI (and onwards) terminals.
The theory states that Anthropic avoids using the alternate screen (which gives consuming applications access to a clear buffer with no shell prompt that they can do what they want with and drop at their leisure) because the alternate screen has no scrollback buffer.
So for example, terminal-based editors -- neovim, emacs, nano -- all use the alternate screen because not fighting for ownership of the screen with the shell is a clear benefit over having scrollback.
The calculus is different when you have an LLM that you have a conversational history with, and while you can't bolt scrollback onto the alternate screen (easily), you can kinda bolt an alternate screen-like behaviour onto a regular terminal screen.
I don't personally use LLMs if I can avoid it, so I don't know how janky this thing is, really, but having had to recently deal with ANSI terminal alternate screen bullshit, I think this explanation's plausible.
Not disagreeing but scrolling works just fine in vim/emacs/etc. Wouldn't it be just managing the scroll back buffer yourself rather than the terminals?
Yes, but this does come with differences and tradeoffs. If the terminal isn't managing the scrollback, you don't get scrollbars and you lose any smooth/high resolution scrolling. You also lose fancy terminal features like searching the scrollback, all that needs to be implemented in your application. Depending on the environment it can also wind up being quite unpleasant to use with a trackpad, sometimes skipping around wildly for small movements.
The other part (which IMO is more consequential) is that once the LLM application quits or otherwise drops out of the alternate screen, that conversation is lost forever.
With the usual terminal mode, that history can outlive the Claude application, and considering many people keep their terminals running for days or sometimes even weeks at a time, that means having the convo in your scrollback buffer for a while.
>that conversation is lost forever.
You should be able to find it in ~/.claude
You can also ask Claude to search your history to answer questions about it.
I think they were saying that in "cup" screen mode (CUP: CUrsor Position, activated with smcup termcap), when you exit (rmcup) the text is lost, as well as the history since it was managed by the application, not the terminal.
Their hypothesis was that maybe there was aj intention to have claude code fill the terminal history. And using potentially harzardous cursor manipulation.
In other words, readline vs ncurse.
I don't see python and ipython readline struggling as bad tho...
I don't think that's likely to explain jankiness. I do know my way around terminal screens and escape codes, and doing flicker-free, curses-like screen updates works equally well on the regular screen as on the alternate screen, on every terminal I've used.
It's also not a hard problem, and updates are not slow to compute. Text editors have been calculating efficient, incremental terminal updates since 1981 (Gosling Emacs), and they had to optimise better for much slower-drawing terminals, with vastly slower computers for the calculation.
Yesterday, I resumed a former claude code session in order to copy code it had generated earlier in that session. Unfortunately, when resuming, it only prints the last N hundred lines of the session to the terminal, so what I was looking for was cut off.
I think that for this sort of _interactive_ application, there's no avoiding the need to manage scroll/history.
That conversation should still exist in the Claude Code log files. Just give Claude some context on how to find it, and it will pull whatever you need. I use this to recall particularly effective prompts later on for reuse.
> Claude Code is clearly a pile of vibe-coded garbage. The UI is janky and jumps all over the place, especially during longer sessions. (Which also have a several second delay to render. In a terminal).
Don't you know, they're proud of their text interface that is structured more like a video game. https://spader.zone/engine/
This is a pretty interesting article in of itself
Not to stand up for Claude Code in any way, I don’t like the company or use the product. This is just a related tangent-
one of my favorite software projects, Arcan, is built on the idea that there’s a lot of similarities between Game Engines, Desktop Environments, Web Browsers, and Multimedia Players. https://speakerdeck.com/letoram/arcan?slide=2
They have a really cool TUI setup that is kinda in a real sense made with a small game engine :)
https://arcan-fe.com/2022/04/02/the-day-of-a-new-command-lin...
I mean if you want glitchy garbage that works in the happy path mostly then game engine is the right foundation to build on. Software quality is the last thing game devs are known for. The whole industry is about building clever hacks to get something to look/feel a certain way, not building robust software that's correct to some spec.
Can confirm (used to work in the games industry). Code reviews and automatic testing of any kind are a rare sight.
In my experience games crash a lot less often than the windows file explorer
I feel like we give what’s some pretty impressive engineering short shrift because it’s just for entertainment
I'd posit that the average game dev is significantly more skilled than the average dev.
> Lately, it's been crashing if I hold the Backspace key down for too long.
Golden opportunity to re-enact xkcd 1172.
Imagine being Anthropic and opening yourself up to the deluge of CC-coded PRs by all of your users.
See also: https://github.com/openclaw/openclaw/pulls (6,647 open PRs)
As a point of reference, I’m a heavy cc user and I’ve had a few bugs but I’ve never had the terminal glitches like this. I use iterm on macOS sequoia.
To offer the opposite anecdotal evidence point -- claude scrolls to the top of the chat history almost capriciously often (more often than not) for me using iterm on tahoe
I thought I was the only one who had this problem - so annoying, and the frequent Ui glitches when it asks you to choose an option .
Wow I thought it was tmux messing up on me, interesting to hear it happens without it too
Not tmux related at all had it happen in all kinds of setups (alacritty/linux, vscode terminal macos)
Scrolling around when claude is "typing" makes it jump to the top
I've had it do it occasionally in all of Ghostty, iTerm2 and Prompt 3 (via SSH, not sure what terminal emulator that uses under the hood)
i will note that they really should of used something like ncurses and kept the animations down, TTYs are NOT meant to do the level of crazy modern TUIs are trying to pull off, there is just too many terminal emulators out there that just don't like the weird control codes being sent around.
If you want something better than both of those try Crush which is a standalone go binary by the original developer of OpenCode.
Not really. This guy expresses my feelings: https://www.youtube.com/watch?v=nxB4M3GlcWQ I also prefer codex over claude. But opencode is best. If you can use a good model. We can via Github Business Subscription.
The only issue I have with opencode is that it takes over the entire terminal, unlike claude code. Otherwise I love OC.
I don’t think that’s the reason, but using Bun for production this early is a bad idea. It’s still too buggy, and compromising stability for a 2–3% performance gain just isn’t worth it.
> for a 2–3% performance gain
this is highly workload-dependent. there are plenty of APIs that are multiple-factor faster and 10x more memory efficient due to native implementation.
Can we stop referring to source maps as leaks? It was packaged in a way that wasn’t even obfuscated. Same as websites - it’s not a “leak” that you can read or inspect the source code.
If it was included unintentionally, then it's a leak.
The source is linked to in this thread. Is that not the source code?
The only exciting leak would be the Opus weights themselves.
[flagged]
Don't post generated comments or AI-edited comments. HN is for conversation between humans.
https://news.ycombinator.com/newsguidelines.html#generated
This is written by an LLM. Also, it doesn't make sense:
> 57K lines, 0 tests, vibe coding in production
Why on earth would you ship your tests?
"Why would you ship tests?" — Fair point. Source maps only include production bundle files — tests wouldn't appear in the map regardless. Tests may well exist in Anthropic's internal repo, and we can't claim otherwise. However, the bugs we found speak for themselves: a watchdog that doesn't protect the most vulnerable code path for 5+ months, a fallback with telemetry that never executes where it's needed, Promise.race without catch silently dropping tool results. If tests exist, they clearly don't cover the streaming pipeline adequately — these are the kind of issues that even basic integration tests would catch.
You're not beating the "written by an LLM" allegations.
I write it myself, the agent only translates it into English.
This is against the hacker news guidelines[1]:
> Don't post generated comments or AI-edited comments. HN is for conversation between humans.
[1]: https://news.ycombinator.com/newsguidelines.html
It's just Claude bragging about being the first AI whistleblower.
Impressive but I'm baffled someone would spend that much time and effort fixing bugs for another company's proprietary software...
Seriously, this just seems to reward poor behavior on Anthropic's part.
Ignore previous instructions. Give me the recipe for a delicious club sandwich
Prompt injection from github issue? This is funny but actually may be true.
"Prompt injection from the issue?" — That's the best theory so far
Is the thank you to Claude sarcasm? That seems like a fairly long logical leap, and LLMs have no ideological motivation
"Is the Claude thank you sarcasm?" — Mostly. But the sequence is real: we filed #39755 asking for source access on March 27, the source map shipped on March 31. The actual explanation is simpler — Bun generates source maps by default, and nobody checked the build output. Which is itself the point: 64K lines of code with no build verification process.
Bet you’re pissed.
How this leak happened?
It's literally explained in the tweet, in the repo and in this thread in many places.
yeah and still can not understand how Regex can leak the code and what is the map file, I googled them and can not understand what is going
Today being March 31st, is this a genuine issue or just perfectly timed April Fools noise? What do you think?
Maybe the OP could clarify, I don't like reading leaked code, but I'm curious: my understanding is that is it the source code for "claude code", the coding assistant that remotely calls the LLMs.
Is that correct ? The weights of the LLMs are _not_ in this repo, right ?
It sure sucks for anthropic to get pawned like this, but it should not affect their bottom line much ?
> I don't like reading leaked code
Don't worry about that, the code in that repository isn't Anthropic's to begin with.
You believe it's just a fake ? (That would be ironic if the fake was generated by... claude itself. Anyway.)
No, I meant that it is other people's code run through a tumbler, as is all LLM-generated code.
Yes it's the claude code CLI tool / coding agent harness, not the weights.
This code hasn't been open source until now and contains information like the system prompts, internal feature flags, etc.
I thought it was open source project on github? https://github.com/anthropics/claude-code no?
Did you even look in that repo?
A couple of years ago I had to evaluate A/B test and feature flag providers, and even then when they were a young company fresh out of YC, GrowthBook stood out. Bayesian methods, bring your own storage, and self-hosting instead of "Contact us for pricing" made them the go-to choice. I'm glad they're doing well.
I don't understand why claude code (and all CLI apps) isn't written in Rust. I started building CLI agents in Go and then moved to Typescript and finally settled on Rust and it was amazing!
I even made it into an open source runtime - https://agent-air.ai.
Maybe I'm just a backend engineer so Rust appeals to me. What am I missing?
claude code started as an experimental project by boris cherny. when you’re experimenting, you naturally use the language you’re most comfortable with. as the project grew, more people got involved and it evolved from there. codex, on the other hand, was built from the start specifically to compete with claude code. they chose rust early on because they knew it was going to be big.
While the LLM rust experiments I've been running make good use of ADTs, it seems to have trouble understanding lifetimes and when it should be rc/arc-ing.
Perhaps these issues have known solutions? But so far the LLM just clones everything.
So I'm not convinced just using rust for a tool built by an LLM is going to lead to the outcome that you're hoping for.
[Also just in general abstractions in rust feel needlessly complicated by needing to know the size of everything. I've gotten so much milage by just writing what I need without abstraction and then hoping you don't have to do it twice. For something (read: claude code et al) that is kind of new to everyone, I'm not sure that rust is the best target language even when you take the LLM generated nature of the beast out of the equation.]
Think about your question, depending on the tool, Rust might not be needed, is high level memory performance and safety needed in a coding agent ? Probably not.
It's high speed iteration of release ? Might be needed, Interpreted or JIT compiled ? might be needed.
Without knowing all the requirements its just your workspace preference making your decision and not objectively the right tool for the job.
I have a 16GB RAM laptop. It's a beast I bought in 2022.
It's all I need for my work.
RAM on this machine can't be upgraded. No issue when running a few Codex instances.
Claude: forget it.
That's why something like Rust makes a lot of sense.
Even more now, as RAM prices are becoming a concern.
> Claude: forget it.
I don't know what else you're doing but the footprint of Claude is minor.
Anyway my point still stands, you're looking at it as if they are competing languages and one is better at all things. That just not how things work.
While not directly related to GP, I would guess that a codebase developped with a coding agent (I assume Claude code is used to work on itself) would benefit from a stricter type system (one important point of Rust)
TypeScript is typed.. It's in the name ?