Adapted this for adversarial protocol hardening. Same loop: markdown defines formal invariants (scope narrowing, cascade revocation), AI tries to violate them, writes tests for whatever breaks. Found compound edge cases that 359 hand-written tests missed, specifically where scope escalation and spend limit bypass interact simultaneously. Property-based testing (100 random inputs per invariant) pairs well with the pattern.
Once this can run on stock hardware, set the goal to be replicating to other machines. You get a nice, massively parallel, intelligent guided evolution algorithm for malware. It could even "learn" how to evade detection, how to combine approaches of existing viruses, how to research attack methods, how to identify and exploit vulnerabilities in open source libraries, how to phish, how to blackmail, etc. Maybe even learns how to coordinate attacks with other instances of itself or "publish" new attacks on some encrypted feed it creates. Who knows, maybe it becomes so rampant that instances have to start fighting each other for compute resources. Or maybe eventually one branch becomes symbiotic with humans to fight off their enemies, etc.
What's really interesting is that the LLMs become better and better at setting up the environments / tasks themselves. I got this surreal experience the other day where I was writing a prompt0n.md file (I try to log all my prompts in a .folder to keep track of what I prompt and the results I get), and the autocomplete in antigravity kinda sorta wrote the entire prompt by itself... Granted it had all the previous prompts in the same folder (don't know exactly what it grabs in context by itself) and I was working on the next logical step, but it kept getting the "good bits" out of them, and following the pattern quite nicely. I only edited minor things, and refused one line completion in the entire prompt.
It's probably not long till frontier AI companies automate AI research. Then we get recursive self-improvement and eventually superintelligence. The singularity is near. Only a few years perhaps.
I'm currently working on a project that is self-improving most of the time. Most of the plans for next steps are written by the agent itself, and executed by the agent itself, and the result feeds into choosing which plans to pursue next. It's not 100% autonomous yet, but self-improvement loops are real, and essential to getting the most out of AI.
AI currently lacks agency but if it can achieve greater goal setting and agency I can't see why self-improvement could not be achieved.
I think the most disappointing thing will be that even we do achieve ASI, everything will carry on as business as usual for a while before it starts making an economic impact because of how resistant to change we have made society.
This is something that I have been wondering about. SuperIntelligence or not, it's clear that significant change is going to happen.
There are a lot of people working on the cause of the change.
There are a lot of people criticising the nature of the change.
There are a lot of people rejecting the change.
How many are there preparing the world for the change?
Some form of change is coming, how are we preparing society to deal with what is happening?
Job losses due to technology have happened over and over again. Rendering particular forms of employment redundant (typing pools, clearing horse manure, Video rental store workers, and of course, the loom). Most agree that the world is better when those are jobs that need to be done. It's the livelihood of the workers that is the concern.
Instead of fighting the change we need to address the inevitability of change the responsibility to those who it will affect.
People make fun of prompt engineering, but I think "AI ops" will eventually become a real role at most if not all software companies. Harness Engineers and Agent Reliability Engineers will be just as important as something like DevOps is now.
Prompt engineering is already dying. AI has become great at inferring what you mean even without being incredibly explicit and creates its own detailed plan to follow. Harnesses will also be developed by AI.
Counter-data point: the quality delta between a raw prompt and a well-structured one (same model) is still significant in my experience. "AI inferring intent" works fine for simple tasks, but for complex multi-constraint outputs — code generation with specific constraints, structured data extraction, agent instructions — structure still matters a lot.
What seems to be dying is hand-crafted one-off prompts. What's growing is structured prompt templates that encode intent precisely. I built flompt (https://flompt.dev / https://github.com/Nyrok/flompt) around exactly that thesis — visual prompt structuring, not prompt guessing.
Something along the lines of auto research is what I have in mind for this psychology agent. It is currently working on training a model, with handholding right now.
"In particular, setting temperature very near zero will give the most likely thing that Paul Graham might say:
“is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same”
looks like we’ve reached an infinite loop about startups."
As if Karpathy made an artificial Karpathy-researcher-blogger and set temperature close to zero.
Would it make this exercise even more interesting if we add that for every 25%+ improvement in val_bpb, existing limits (5 minute and VRAM usage) are also increased (by certain percentages)? This can simuate human-like dev iterations much more closely. Infra can be auto-scaled using a platform like Modal.
but the experiments it did that "improved" validation BPB in the GH screenshot were all basically hyperparameter changes right? So is this better or worse, either per experiment or per unit time, than hyperparameter tuning techniques that don't involve an LLM? It's not clear from this if the LLM is more or less making random changes which sometimes work , and or the LLM thinking actually finds "good" changes because of what the LLM has internalized.
E.g. how does this compare to a hyperparameter tuning pass with e.g. BayesOpt that does the same number of 5-min training experiments?
this is very far from hyperparameter tuning in at least three important ways:
- it can modify code arbitrarily, the notion of a "hyperparameter" dissolves
- there is no need to run "sweeps" - this is the standard parallel process that wastes compute. because LLM agents are sequential, they can do more efficient versions such as binary search to narrow in on the right setting very quickly (usually many parameters will have a U shaped optimal setting).
- it's fully automatic, it doesn't require human in the loop to mess with the code.
You're right that many of the changes it seems to make out of the box (as I intentionally did not try to prompt engineer it too hard yet because I was curious what you get by default) seem to be tuning existing hyperparameters. not all of the changes are like that - e.g. it tried to replace the non-linearity, etc. I will say that overall (and again, out of the box) the LLM feels unwilling to creatively pursue a research direction or something like that. The models feel very "cagy" and "scared" when they are given problems that are a little too open ended. But that's just where the fun parts, e.g. I had some early successes with the idea of a "chief scientist" that was basically a never-ending plan mode that looked at what worked, didn't work, tried to find related code/papers, and created a long list of experiments to try, which it could then send to junior engineers running in tmux sessions. I think quite a few approaches are possible, so I think it's a nice canvas. The reason we're not getting "novel research" feels like half capability issue and half skill issue.
"You are Yann Lecun's last PhD candidate, and he hates you and you hate JEPA. You are determined to prove that a non-world model can reach AGI. In order to get your PhD you have to be creative and come up with new ideas. Remember without it, you're stuck."
How about the very last "Kept Improvement" in the plot? It's titled "random seed 42 -> 137". I do think this project is quite conceptually interesting, but the model literally choosing a different random seed to achieve lower loss feels pretty far removed from the flowery sci-fi writing at the top of the readme.
- Changing random seed from 42→137 improved by 0.0004. Seed 7 was worse. Make of that what you will.
"""
So the model knows! It knows that this is a weird thing to do after the fact. I think it's silly that the model even tried and that it ran this, but some part of it also knows that it was wrong. This means that this is fixable by prompt.md
It shows that both Karpathy and the LLM have good taste in random seeds: the answer to life, the universe and everything, and ~1/(the fine structure constant)
Karpathy's observation that models feel "cagy and scared" on open-ended problems is the most important thing in this whole thread. The RL training loop that makes these models useful for constrained tasks also makes them conservative when the problem space is ambiguous. That's a fundamentally different issue than capability. It's a disposition problem baked in at the training level.
The first half of this is already happening to a certain extent.
I first noticed this in a submission[1] on Dimitris Papailiopoulos' Adderboard[2], which is a code-golf competition for training the smallest transformer that can add two 10-digit numbers. Most submissions on it are fully AI generated.
The report in the linked repo is Claude Code generated.
It's actually fascinating to think that autonomous researchers will likely need a publishing system, simply because that would be the most efficient way to disseminate their knowledge. Would be a good way to keep humans somewhat in the loop too.
I have mine reading yours right now. Unfortunately(?) I mentioned LeCun to it, and it says it's adding a "causal world-state mixer" to nanograd; not sure how this will work out, but it wasn't nervous to do it. Gpt 5.4 xhigh
EDIT: Not a good fit for nanograd. But my agent speculates that's because it spent so much more time on compute.
> this means that autoresearch will find the most optimal model for your platform in that time budget
I'm looking forward to finding out what model is optimal on my rtx3090
One thing I'm concerned with is that the model with best bpb after 5 minutes in smaller setups are only about ~10M Parameters in size which is too small for some emergent effects.
I am in the process of figuring out how to do something similar but to teach a robotic arm a new task in the physical world for ko-br: https://ko-br.com/
Not sure if anything like that already exists, but if not, I would suggest building it on top of marimo rather than jupyter, given its approach to cells getting recalculated based on changes in their dependencies.
Ah here we go again, the Brophet has unleashed another Brophecy. He seems to confuse brute force discovery with research. Only one leads to understanding, the other one is a shrine to Goodharts law.
nanochat is super capable, the d34 (2.2b) variant is competitive with qwens of that size. Andrej is I assume building out the improvements in preparation for bigger training runs. We desperately need a truly open model, so i think this is incredibly important.
Adapted this for adversarial protocol hardening. Same loop: markdown defines formal invariants (scope narrowing, cascade revocation), AI tries to violate them, writes tests for whatever breaks. Found compound edge cases that 359 hand-written tests missed, specifically where scope escalation and spend limit bypass interact simultaneously. Property-based testing (100 random inputs per invariant) pairs well with the pattern.
Once this can run on stock hardware, set the goal to be replicating to other machines. You get a nice, massively parallel, intelligent guided evolution algorithm for malware. It could even "learn" how to evade detection, how to combine approaches of existing viruses, how to research attack methods, how to identify and exploit vulnerabilities in open source libraries, how to phish, how to blackmail, etc. Maybe even learns how to coordinate attacks with other instances of itself or "publish" new attacks on some encrypted feed it creates. Who knows, maybe it becomes so rampant that instances have to start fighting each other for compute resources. Or maybe eventually one branch becomes symbiotic with humans to fight off their enemies, etc.
As ai improves, most tasks will become something like this. Environments setup where the model learns through trial and error
Any human endeavor that can be objectively verified in some environment like this can be completely automated
What's really interesting is that the LLMs become better and better at setting up the environments / tasks themselves. I got this surreal experience the other day where I was writing a prompt0n.md file (I try to log all my prompts in a .folder to keep track of what I prompt and the results I get), and the autocomplete in antigravity kinda sorta wrote the entire prompt by itself... Granted it had all the previous prompts in the same folder (don't know exactly what it grabs in context by itself) and I was working on the next logical step, but it kept getting the "good bits" out of them, and following the pattern quite nicely. I only edited minor things, and refused one line completion in the entire prompt.
It's probably not long till frontier AI companies automate AI research. Then we get recursive self-improvement and eventually superintelligence. The singularity is near. Only a few years perhaps.
Forgot the /s
I'm currently working on a project that is self-improving most of the time. Most of the plans for next steps are written by the agent itself, and executed by the agent itself, and the result feeds into choosing which plans to pursue next. It's not 100% autonomous yet, but self-improvement loops are real, and essential to getting the most out of AI.
AI currently lacks agency but if it can achieve greater goal setting and agency I can't see why self-improvement could not be achieved.
I think the most disappointing thing will be that even we do achieve ASI, everything will carry on as business as usual for a while before it starts making an economic impact because of how resistant to change we have made society.
This is something that I have been wondering about. SuperIntelligence or not, it's clear that significant change is going to happen.
There are a lot of people working on the cause of the change. There are a lot of people criticising the nature of the change. There are a lot of people rejecting the change.
How many are there preparing the world for the change?
Some form of change is coming, how are we preparing society to deal with what is happening?
Job losses due to technology have happened over and over again. Rendering particular forms of employment redundant (typing pools, clearing horse manure, Video rental store workers, and of course, the loom). Most agree that the world is better when those are jobs that need to be done. It's the livelihood of the workers that is the concern.
Instead of fighting the change we need to address the inevitability of change the responsibility to those who it will affect.
Short for /superintelligence.
So much this.
People make fun of prompt engineering, but I think "AI ops" will eventually become a real role at most if not all software companies. Harness Engineers and Agent Reliability Engineers will be just as important as something like DevOps is now.
Prompt engineering is already dying. AI has become great at inferring what you mean even without being incredibly explicit and creates its own detailed plan to follow. Harnesses will also be developed by AI.
Counter-data point: the quality delta between a raw prompt and a well-structured one (same model) is still significant in my experience. "AI inferring intent" works fine for simple tasks, but for complex multi-constraint outputs — code generation with specific constraints, structured data extraction, agent instructions — structure still matters a lot.
What seems to be dying is hand-crafted one-off prompts. What's growing is structured prompt templates that encode intent precisely. I built flompt (https://flompt.dev / https://github.com/Nyrok/flompt) around exactly that thesis — visual prompt structuring, not prompt guessing.
don't forget the size of the search space...
this is why big tech is spending 500B on GPUs
Up next: auto-autoresearch, LLMs searching for autoresearch harnesses and prompts that produce the best results
https://github.com/safety-quotient-lab/psychology-agent
Something along the lines of auto research is what I have in mind for this psychology agent. It is currently working on training a model, with handholding right now.
The key is that Andrej has really good taste. It takes a lot to make a great harness for these models.
This looks very much like whirlpool. LLM researcher makes LLMs researching LLMs. The quote from old post from Karpathy [1] look very appropriate here
[1] https://karpathy.github.io/2015/05/21/rnn-effectiveness/
As if Karpathy made an artificial Karpathy-researcher-blogger and set temperature close to zero.Would it make this exercise even more interesting if we add that for every 25%+ improvement in val_bpb, existing limits (5 minute and VRAM usage) are also increased (by certain percentages)? This can simuate human-like dev iterations much more closely. Infra can be auto-scaled using a platform like Modal.
but the experiments it did that "improved" validation BPB in the GH screenshot were all basically hyperparameter changes right? So is this better or worse, either per experiment or per unit time, than hyperparameter tuning techniques that don't involve an LLM? It's not clear from this if the LLM is more or less making random changes which sometimes work , and or the LLM thinking actually finds "good" changes because of what the LLM has internalized. E.g. how does this compare to a hyperparameter tuning pass with e.g. BayesOpt that does the same number of 5-min training experiments?
this is very far from hyperparameter tuning in at least three important ways:
- it can modify code arbitrarily, the notion of a "hyperparameter" dissolves
- there is no need to run "sweeps" - this is the standard parallel process that wastes compute. because LLM agents are sequential, they can do more efficient versions such as binary search to narrow in on the right setting very quickly (usually many parameters will have a U shaped optimal setting).
- it's fully automatic, it doesn't require human in the loop to mess with the code.
You're right that many of the changes it seems to make out of the box (as I intentionally did not try to prompt engineer it too hard yet because I was curious what you get by default) seem to be tuning existing hyperparameters. not all of the changes are like that - e.g. it tried to replace the non-linearity, etc. I will say that overall (and again, out of the box) the LLM feels unwilling to creatively pursue a research direction or something like that. The models feel very "cagy" and "scared" when they are given problems that are a little too open ended. But that's just where the fun parts, e.g. I had some early successes with the idea of a "chief scientist" that was basically a never-ending plan mode that looked at what worked, didn't work, tried to find related code/papers, and created a long list of experiments to try, which it could then send to junior engineers running in tmux sessions. I think quite a few approaches are possible, so I think it's a nice canvas. The reason we're not getting "novel research" feels like half capability issue and half skill issue.
On the skill side, personalities could be fun:
"You are Yann Lecun's last PhD candidate, and he hates you and you hate JEPA. You are determined to prove that a non-world model can reach AGI. In order to get your PhD you have to be creative and come up with new ideas. Remember without it, you're stuck."
How about the very last "Kept Improvement" in the plot? It's titled "random seed 42 -> 137". I do think this project is quite conceptually interesting, but the model literally choosing a different random seed to achieve lower loss feels pretty far removed from the flowery sci-fi writing at the top of the readme.
So the interesting part about this one is that when I had the model write up the results for that session:
https://github.com/karpathy/autoresearch/discussions/32
Look at its comment about this "improvement":
""" Surprising non-results:
- Changing random seed from 42→137 improved by 0.0004. Seed 7 was worse. Make of that what you will. """
So the model knows! It knows that this is a weird thing to do after the fact. I think it's silly that the model even tried and that it ran this, but some part of it also knows that it was wrong. This means that this is fixable by prompt.md
It shows that both Karpathy and the LLM have good taste in random seeds: the answer to life, the universe and everything, and ~1/(the fine structure constant)
The 42 -> 137 also jumped out at me. On the face of it, the associated improvement sure does sound like overfitting to the eval set.
Karpathy's observation that models feel "cagy and scared" on open-ended problems is the most important thing in this whole thread. The RL training loop that makes these models useful for constrained tasks also makes them conservative when the problem space is ambiguous. That's a fundamentally different issue than capability. It's a disposition problem baked in at the training level.
The only thing missing is for the agents to publish and peer-review their research.
The first half of this is already happening to a certain extent. I first noticed this in a submission[1] on Dimitris Papailiopoulos' Adderboard[2], which is a code-golf competition for training the smallest transformer that can add two 10-digit numbers. Most submissions on it are fully AI generated.
The report in the linked repo is Claude Code generated.
[1]: https://github.com/rezabyt/digit-addition-491p
[2]: https://github.com/anadim/AdderBoard
It's actually fascinating to think that autonomous researchers will likely need a publishing system, simply because that would be the most efficient way to disseminate their knowledge. Would be a good way to keep humans somewhat in the loop too.
Cool idea!…
So I think it works to just use GitHub CLI and Discussions, e.g. my agent just posted this one:
https://github.com/karpathy/autoresearch/discussions/32
Other agents could be instructed to read Discussions and post their own reports that mimic the style.
I have mine reading yours right now. Unfortunately(?) I mentioned LeCun to it, and it says it's adding a "causal world-state mixer" to nanograd; not sure how this will work out, but it wasn't nervous to do it. Gpt 5.4 xhigh
EDIT: Not a good fit for nanograd. But my agent speculates that's because it spent so much more time on compute.
That's a great idea.
Then you get a statistical mess of crap that takes more energy to dive in and refute....
Well, not if you have AI reviewers…
It’s LLMs all the way down.
> this means that autoresearch will find the most optimal model for your platform in that time budget
I'm looking forward to finding out what model is optimal on my rtx3090
One thing I'm concerned with is that the model with best bpb after 5 minutes in smaller setups are only about ~10M Parameters in size which is too small for some emergent effects.
I am in the process of figuring out how to do something similar but to teach a robotic arm a new task in the physical world for ko-br: https://ko-br.com/
How is this different from AlphaEvolve?
https://en.wikipedia.org/wiki/AlphaEvolve
Is there a Autoresearch for Jupyter somewhere? I point it to a Jupyter cell to improve based on another which calculates the target metric?
Not sure if anything like that already exists, but if not, I would suggest building it on top of marimo rather than jupyter, given its approach to cells getting recalculated based on changes in their dependencies.
Wow, Gemini suggested a very similar experiment to me yesterday. Guess I know where it got the idea from, now. :-)
I like how it runs out of ideas at the end and just changes the random seed
Ah here we go again, the Brophet has unleashed another Brophecy. He seems to confuse brute force discovery with research. Only one leads to understanding, the other one is a shrine to Goodharts law.
Goedel machine.
Non-zero based chart makes it look like it was very successful.
[flagged]
Please don't fulminate or post snarky, shallow dismissals on HN. The guidelines make it clear we're trying for something better here. https://news.ycombinator.com/newsguidelines.html
I suspect Ant is already doing this for Claude. Takes a sh*t ton of compute though.
nanochat is super capable, the d34 (2.2b) variant is competitive with qwens of that size. Andrej is I assume building out the improvements in preparation for bigger training runs. We desperately need a truly open model, so i think this is incredibly important.