Readers should make sure to contextualize this. We're talking about people researching AGI. Current LLM models are amazing, and will have business and societal impact. Previous ML models also had business and societal impact. None of that is contested here. The question is, what path leads to AGI, do LLM scale to AGI? That is the question being asked here, and some researchers think that it won't, it will scale superbly to many things, but something else might be needed for full AGI.
The relevant question is whether Humans + LLMs are much more likely to get to AGI than humans without LLMS. And the answer is pretty obviously yes. I don't think anyone was arguing that we would get to AGI by just training on more data with exactly the same models. Practically every advance in the last few years has been building additional functionality on top of LLMs, not just scaling up the same architecture to more data.
But zooming out, LLMs are universal approximators, so it's trivially true that they can approximate any function that describes AGI. It's also true that logic (from logos or "word") is about reasoning constrained by language and conversations. So an LLM is the right sort of device you'd expect to achieve general intelligence.
There are arguably non-linguistic forms of intelligence, such as visual intelligence. But those also can operate on written symbols (e.g. the stream of bits from an image file).
The other relevant question is why does Gary Macus always seem so angry? It's draining reading one of his posts.
The concept of mostly static weights holding the bulk of base intuition/knowledge (foundation if you will ;)) seems like a good bet, since it's how the mammalian brain works (with updates of those long term weights mostly happening while you sleep [1]).
I very naively assume the "easy" path will be similar: a very different system that's bolted on/references the foundation models, to enable the realtime/novel reasoning (outside the fixed latent space) bit that isn't possible now.
I think it's pretty rare for someone to use a pure LLM, today, or even a year ago. Function calls, MCP, tricks with thinking models, etc, all make these system "impure", and also much more capable.
Although it may be true that LLMs will not achieve AGI in the purest sense, they have at least forced us to move a lot of goalposts. I don't know what Gary Marcus was saying a few years ago, but I think many people would have said that e.g
achieving a gold medal at the Mathematics Olympiads would require AGI, not just LLMs.
Gary Marcus has been taking victory laps on this since mid-2023, nothing to see here. Patently obvious to all that there will be additional innovations on top of LLMs such as test-time compute, which nonetheless are structured around LLMs and complementary
Looking at the quoted tweet it is immediately obvious that these people have no clue about the current state of research. Yes they might have had some more or less relevant contributions to classical ML, but AI has taken off without (or rather despite) them and if history of AI has shown anything, it's that people like those are not the ones who will pave the way forward. In a field like this, there's no use to listen to people who still cling to their old ideas just because the current ideas don't seem "elegant" or "right" in their mind. The only thing you can trust is data and it proves we haven't peaked yet when it comes to LLMs.
His work isn't all that different from what many other people in the space are doing. He just prefaces himself to be far more iconoclastic and "out there" than he actually is.
Someone who seems "addicted to feeling smug" is likely seeking constant validation for a grandiose sense of self importance. The smugness is the emotional payoff. The fix. That temporarily works up their fragile self-esteem.
This pattern of behavior is most closely associated with Narcissistic Personality Disorder in the DSM-5.
"We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done." - I am curious if people would read this as an advocacy or criticism of LLMs?
Discovery comes from search for both humans and AI agents. There is no magic in the brain or LLM except learning along the way and persistence. The search space itself is externalized.
So the AI agents are "good enough" but environment access is insufficient for collecting the required experience, this is the current bottleneck.
For example even a simple model like AlphaZero (just a CNN) was good enough to beat the best humans and rediscover game play from scratch, but it had the extensive access to the environment.
i don't think general intelligence is technically unachievable with ML but i think we're still orders of magnitude away from the amount of compute needed to reach it and everyone is in a honeymoon period because of how useful text prediction and the current state of it has proven to our day to day jobs
What are the odds they will just be stumbling around for another few decades before the next big discontinuous jump in effectiveness is uncovered? The AI Gods always had big ideas and opinions, but the discovery of LLMs seem to have been pure serendipidy.
I've often thought that if you want to represent a probabilistic world model, with nodes that represent physical objects in space-time (and planned-future space-time) and our level of certainty about their relationships to one another... you'd do that outside an LLM's token stream.
You could, in theory, represent that model as a linear stream of tokens, and provide it as context to an LLM directly. It would be an absurdly wasteful number of tokens, at minimum, and the attention-esque algorithm for how someone might "skim" that model given a structured query would be very different from how we skim over text, or image patches, or other things we represent in the token stream of typical multi-modal LLMs.
But could it instead be something that we provide as a tool to LLMs, and use an LLM as the reasoning system to generate structured commands that interact with it? I would wager that anyone who's read a book, drawn a map of the fantasy world within, and argued about that map's validity on the internet, would consider this a viable path.
At the end of the day, I think that the notion of a "pure LLM" is somewhat pedantic, because the very term LLM encapsulates our capability of "gluing" unstructured text to other arbitrary tools and models. Did we ever expect to tie our hands behind our back and make it so those arbitrary tools and models aren't allowed to maintain state? And if they can maintain state, then they can maintain the world model, and let the LLM apply the "bitter lesson" that compute always wins, on how to best interact with and update that state.
The difference between Gary Marcus and you is the capacity to tell right from wrong.
He has no problems pimping his credentials and shitting on other people's work and lying through his teeth to enrich himself. He's obviously intelligent enough to know better, but he's a singularly intellectually dishonest figure.
He's a one man version of The Enquirer or Zergnet for AI, and thrives entirely on dishonest takes and divisive commentary, subsiding on pure clickbait. There is absolutely no reason to regard anything he says with any level of seriousness or credulity, he's an unprincipled jackass cashing out unearned regard by grifting and shilling, loudly.
If you must, here's an archived link, don't reward him with clicks.
He really shouldn't end up on the top ten of HN, let alone the front page. It's like an SEO hack boosting some guy proudly documenting pictures of his bowel movements.
Demonstrating that Rich Sutton was never really on the 'LLM bus' in the first place. Note the remarkable absence from the essay of language models & large language models from that essay despite BERT and GPT-2 and 'unreasonable effectiveness of data' etc. He only briefly mentions speech recognition. (Note also Sutton's general absence from LLM research, the Edmund Plan or switch from DeepMind to Keen Technologies as DeepMind was forced into LLM-centric research, and his published research since 2019's emphasis on small models and trying to fix their pathologies like catastrophic forgetting.)
> The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
You could easily seem most LLM work as a dead end because it is about 'building knowledge into your agents' (eg. by paying data labelers billions of dollars total to supplement your scrapes), and not about 'search' (still a major open problem for LLMs - o1-style serial reasoning traces are obviously inadequate) or 'learning' (LLMs depend so heavily on the knowledge already encoded in so much data for them).
I just checked - he's right. Anthropic won't write code anymore. ChatGPT is just jumbled, dyslexic letters and nonsense. I generated a Midjourney image 10 times, each one was just TV static.
His stance on LLMs can be modeled by a simple finite state machine:
State 1) LLM performance stalls for a couple of months:
- "See I told you, LLMS are a dead end and won't work!"
State 2) New LLM release makes rapid and impressive improvements
- "AI is moving too fast! This is dangerous and we need the government to limit the labs to slow them down!"
There’s quite a gulf of difference between saying something is a dead end to full on general artificial intelligence, and saying it’s all dead and will collapse.
I have no idea if LLMs will be general AI, but they defo aren’t going anywhere
I think the question we've failed to properly ask is "are all humans general intelligences".
I frequently encounter people who appear less refined in reasoning and communication than an LLM. Granted, being an awkward communicator is excusable but interrogation of these peoples belief systems seem to reveal a word model more than a world model.
There was once a blog (maybe it still exists, idk) called Bitfinexed, which researched fraud perpetrated by the Bitfinex/Tether creators. He forecast every month for multiple years an imminent Tether crash, based on the multiple data points and logical conclusions. His prediction was wrong, since Tether org is still alive and out of jail. But this doesn't mean that his chain of arguments and logic was wrong. It was simple a case when fraud and stakes were so big, that through corruptions and some assets infusions, the whole scheme had been saved.
Just because something is a case of "old man yelling at clouds" doesn't mean that underlying logic is always wrong. Sometimes markets can be irrational longer than we expect.
"It is difficult to get a man to understand something, when his salary depends on his not understanding it" or something like that. There's quite an appetite on the internet for ai derision articles.
Marcus claims to have reread The Bitter Lesson. And I should say, I too have reread the text and I don't think Marcus is getting the actual original argument of here. All it say is that general purpose algorithms that scale will outperform special purpose algorithms that use information about the problem and don't scale. That's all. Everyone claiming more is hallucinating, things into this basic point. Notably general purpose algorithms aren't necessary neural nets and "X works better than Y" doesn't imply X is the best thing every.
So there's no contradiction between The Bitter Lesson and claims that LLMs have big hole and/or won't scale up to AGI.
Gary Marcus has never built anything, has never contributed meaningful to any research that actually produces value, nor has he been right about any of his criticisms.
What he has done is continually move a goalpost to stay somewhat relevant in the blogsphere and presumably the academic world.
And I never took biology past sophomore year and yet I knew the first time I listened to Aubrey De Grey he was wrong to propose millennians (living to be 1,000+) had already been born (as of 2005).
People can laugh at Gary Marcus all they want, but there’s one aspect that people don’t understand.
If you have a high conviction belief that is countervailing to the mainstream, you suffer a great deal. Even the most average conversation with a “mainstream believer” can turn into a judgment fest. Sometimes people stop talking to you mid-conversation. Investors quietly remove you from their lead lists. Candidates watch your talks and go dark on you. People with no technical expertise lecture at you.
Yet, inevitably, a fraction of such people carry forward. They don’t shut up. And they are the spoon that stirs the pot of science.
It’s totally normal and inevitable that people start to take victory laps at even the smallest indication in such situations. It doesn’t mean they’re right. It is just not something worth criticizing.
No. There is a difference between being a contrarian who is making a bet and acting on their different viewpoint, and someone who is simply a debbie downer and wants to say something negative all the time with no skin in the game. Marcus is the latter.
It's easy to be a faux contrarian that just always says we're in a bubble or x is overhyped. Everyone knows that, it's the nature of markets and not an insight. The only value is having some actual insight into where and how things stop and where there is some upside. Otherwise you're just a jealous loser.
Readers should make sure to contextualize this. We're talking about people researching AGI. Current LLM models are amazing, and will have business and societal impact. Previous ML models also had business and societal impact. None of that is contested here. The question is, what path leads to AGI, do LLM scale to AGI? That is the question being asked here, and some researchers think that it won't, it will scale superbly to many things, but something else might be needed for full AGI.
The relevant question is whether Humans + LLMs are much more likely to get to AGI than humans without LLMS. And the answer is pretty obviously yes. I don't think anyone was arguing that we would get to AGI by just training on more data with exactly the same models. Practically every advance in the last few years has been building additional functionality on top of LLMs, not just scaling up the same architecture to more data.
But zooming out, LLMs are universal approximators, so it's trivially true that they can approximate any function that describes AGI. It's also true that logic (from logos or "word") is about reasoning constrained by language and conversations. So an LLM is the right sort of device you'd expect to achieve general intelligence.
There are arguably non-linguistic forms of intelligence, such as visual intelligence. But those also can operate on written symbols (e.g. the stream of bits from an image file).
The other relevant question is why does Gary Macus always seem so angry? It's draining reading one of his posts.
The concept of mostly static weights holding the bulk of base intuition/knowledge (foundation if you will ;)) seems like a good bet, since it's how the mammalian brain works (with updates of those long term weights mostly happening while you sleep [1]).
I very naively assume the "easy" path will be similar: a very different system that's bolted on/references the foundation models, to enable the realtime/novel reasoning (outside the fixed latent space) bit that isn't possible now.
[1] https://animalcare.umich.edu/our-impact/our-impact-monitorin...
I think it's pretty rare for someone to use a pure LLM, today, or even a year ago. Function calls, MCP, tricks with thinking models, etc, all make these system "impure", and also much more capable.
I don't think the debate will end until Hard AI's are common- and even then philosophers will keep going.
Although it may be true that LLMs will not achieve AGI in the purest sense, they have at least forced us to move a lot of goalposts. I don't know what Gary Marcus was saying a few years ago, but I think many people would have said that e.g achieving a gold medal at the Mathematics Olympiads would require AGI, not just LLMs.
Gary Marcus has been taking victory laps on this since mid-2023, nothing to see here. Patently obvious to all that there will be additional innovations on top of LLMs such as test-time compute, which nonetheless are structured around LLMs and complementary
“On top of LLMs” is exactly not “pure LLMs”, though, and it’s also not clear if TTC will end up supporting the bitter lesson.
Looking at the quoted tweet it is immediately obvious that these people have no clue about the current state of research. Yes they might have had some more or less relevant contributions to classical ML, but AI has taken off without (or rather despite) them and if history of AI has shown anything, it's that people like those are not the ones who will pave the way forward. In a field like this, there's no use to listen to people who still cling to their old ideas just because the current ideas don't seem "elegant" or "right" in their mind. The only thing you can trust is data and it proves we haven't peaked yet when it comes to LLMs.
Anyone have links for these:
> Yann LeCun was first, fully coming around to his own, very similar critique of LLMs by end of 2022.
> The Nobel Laureate and Google DeepMind CEO Sir Demis Hssabis sees it now, too.
For Yann LeCun, he says it here: https://www.weforum.org/meetings/world-economic-forum-annual...
He's personally moved on from LLM and exploring new architecture more built around world models.
Which he describes here: https://x.com/ylecun/status/1759933365241921817
Also I think the 2022 quoted refers to this Paper by Yann: https://openreview.net/pdf?id=BZ5a1r-kVsf
This is nonsense. LeCun is working on LLMs (and all the rest of it): https://arxiv.org/abs/2509.14252
His work isn't all that different from what many other people in the space are doing. He just prefaces himself to be far more iconoclastic and "out there" than he actually is.
I'm just paraphrasing what he says in the interview I linked.
Is there a definition in the manual of mental disorders (DSM-5 in the US) for an addiction to feeling smug? If so we should call it Marcus Disorder.
Someone who seems "addicted to feeling smug" is likely seeking constant validation for a grandiose sense of self importance. The smugness is the emotional payoff. The fix. That temporarily works up their fragile self-esteem.
This pattern of behavior is most closely associated with Narcissistic Personality Disorder in the DSM-5.
Referenced Article in this post -> http://www.incompleteideas.net/IncIdeas/BitterLesson.html
"We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done." - I am curious if people would read this as an advocacy or criticism of LLMs?
Discovery comes from search for both humans and AI agents. There is no magic in the brain or LLM except learning along the way and persistence. The search space itself is externalized.
So the AI agents are "good enough" but environment access is insufficient for collecting the required experience, this is the current bottleneck.
For example even a simple model like AlphaZero (just a CNN) was good enough to beat the best humans and rediscover game play from scratch, but it had the extensive access to the environment.
i don't think general intelligence is technically unachievable with ML but i think we're still orders of magnitude away from the amount of compute needed to reach it and everyone is in a honeymoon period because of how useful text prediction and the current state of it has proven to our day to day jobs
What are the odds they will just be stumbling around for another few decades before the next big discontinuous jump in effectiveness is uncovered? The AI Gods always had big ideas and opinions, but the discovery of LLMs seem to have been pure serendipidy.
I've often thought that if you want to represent a probabilistic world model, with nodes that represent physical objects in space-time (and planned-future space-time) and our level of certainty about their relationships to one another... you'd do that outside an LLM's token stream.
You could, in theory, represent that model as a linear stream of tokens, and provide it as context to an LLM directly. It would be an absurdly wasteful number of tokens, at minimum, and the attention-esque algorithm for how someone might "skim" that model given a structured query would be very different from how we skim over text, or image patches, or other things we represent in the token stream of typical multi-modal LLMs.
But could it instead be something that we provide as a tool to LLMs, and use an LLM as the reasoning system to generate structured commands that interact with it? I would wager that anyone who's read a book, drawn a map of the fantasy world within, and argued about that map's validity on the internet, would consider this a viable path.
At the end of the day, I think that the notion of a "pure LLM" is somewhat pedantic, because the very term LLM encapsulates our capability of "gluing" unstructured text to other arbitrary tools and models. Did we ever expect to tie our hands behind our back and make it so those arbitrary tools and models aren't allowed to maintain state? And if they can maintain state, then they can maintain the world model, and let the LLM apply the "bitter lesson" that compute always wins, on how to best interact with and update that state.
> ... major thinker ...
Thesis: language isn't a great representation, basically.
I really should apply myself. Maybe I wouldn't work so hard, just shuck nonsense/pontificate.
The difference between Gary Marcus and you is the capacity to tell right from wrong.
He has no problems pimping his credentials and shitting on other people's work and lying through his teeth to enrich himself. He's obviously intelligent enough to know better, but he's a singularly intellectually dishonest figure.
He's a one man version of The Enquirer or Zergnet for AI, and thrives entirely on dishonest takes and divisive commentary, subsiding on pure clickbait. There is absolutely no reason to regard anything he says with any level of seriousness or credulity, he's an unprincipled jackass cashing out unearned regard by grifting and shilling, loudly.
If you must, here's an archived link, don't reward him with clicks.
https://archive.is/L09dP
He really shouldn't end up on the top ten of HN, let alone the front page. It's like an SEO hack boosting some guy proudly documenting pictures of his bowel movements.
IMHO, I think a debate article's worthiness for HN should be dismissed (or not) on its own merits, not by attacking the author.
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Demonstrating that Rich Sutton was never really on the 'LLM bus' in the first place. Note the remarkable absence from the essay of language models & large language models from that essay despite BERT and GPT-2 and 'unreasonable effectiveness of data' etc. He only briefly mentions speech recognition. (Note also Sutton's general absence from LLM research, the Edmund Plan or switch from DeepMind to Keen Technologies as DeepMind was forced into LLM-centric research, and his published research since 2019's emphasis on small models and trying to fix their pathologies like catastrophic forgetting.)
> The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
You could easily seem most LLM work as a dead end because it is about 'building knowledge into your agents' (eg. by paying data labelers billions of dollars total to supplement your scrapes), and not about 'search' (still a major open problem for LLMs - o1-style serial reasoning traces are obviously inadequate) or 'learning' (LLMs depend so heavily on the knowledge already encoded in so much data for them).
I don't think Richard Sutton was ever on that bus. Am I wrong?
I just checked - he's right. Anthropic won't write code anymore. ChatGPT is just jumbled, dyslexic letters and nonsense. I generated a Midjourney image 10 times, each one was just TV static.
It's... it's over. The west has fallen.
Writing code does not require general intelligence.
Why do people even listen to Gary Marcus?
His stance on LLMs can be modeled by a simple finite state machine:
State 1) LLM performance stalls for a couple of months: - "See I told you, LLMS are a dead end and won't work!"
State 2) New LLM release makes rapid and impressive improvements - "AI is moving too fast! This is dangerous and we need the government to limit the labs to slow them down!"
Repeat
Gary Marcus talks about AI in a dogmatic way.
Sutton... the patron saint of scaling...
Listen to people for the their ideas, not their label.
Regardless, Marcus is a bit late to comment on the bitter lesson. That is so 6 months ago lol
Gary Marcus saying the same things Gary Marcus has always said.
It doesn’t matter what incredible things neural networks do, in his mind they’re always a dead end that will collapse any day now.
There’s quite a gulf of difference between saying something is a dead end to full on general artificial intelligence, and saying it’s all dead and will collapse.
I have no idea if LLMs will be general AI, but they defo aren’t going anywhere
I think the question we've failed to properly ask is "are all humans general intelligences".
I frequently encounter people who appear less refined in reasoning and communication than an LLM. Granted, being an awkward communicator is excusable but interrogation of these peoples belief systems seem to reveal a word model more than a world model.
There should be a [Gary Marcus] tag in the link, it's all one needs to know, as soon as I saw that I closed the page.
(I notice in retrospect it does show that it's a link to his substack so I guess that's sufficient warning, I didn't see it)
I tend to get more benefit from looking at why someone thinks what they think - than what they think.
There was once a blog (maybe it still exists, idk) called Bitfinexed, which researched fraud perpetrated by the Bitfinex/Tether creators. He forecast every month for multiple years an imminent Tether crash, based on the multiple data points and logical conclusions. His prediction was wrong, since Tether org is still alive and out of jail. But this doesn't mean that his chain of arguments and logic was wrong. It was simple a case when fraud and stakes were so big, that through corruptions and some assets infusions, the whole scheme had been saved.
Just because something is a case of "old man yelling at clouds" doesn't mean that underlying logic is always wrong. Sometimes markets can be irrational longer than we expect.
"It is difficult to get a man to understand something, when his salary depends on his not understanding it" or something like that. There's quite an appetite on the internet for ai derision articles.
True, but the novelty of the post here is that Sutton now agrees with him.
OK,
Marcus claims to have reread The Bitter Lesson. And I should say, I too have reread the text and I don't think Marcus is getting the actual original argument of here. All it say is that general purpose algorithms that scale will outperform special purpose algorithms that use information about the problem and don't scale. That's all. Everyone claiming more is hallucinating, things into this basic point. Notably general purpose algorithms aren't necessary neural nets and "X works better than Y" doesn't imply X is the best thing every.
So there's no contradiction between The Bitter Lesson and claims that LLMs have big hole and/or won't scale up to AGI.
Gary Marcus has never built anything, has never contributed meaningful to any research that actually produces value, nor has he been right about any of his criticisms.
What he has done is continually move a goalpost to stay somewhat relevant in the blogsphere and presumably the academic world.
And I never took biology past sophomore year and yet I knew the first time I listened to Aubrey De Grey he was wrong to propose millennians (living to be 1,000+) had already been born (as of 2005).
[flagged]
The room is not doing anything of the sort.
Does this mean you think the collective datacenter mass of ChatGPT or something more emergent is AGI?
People can laugh at Gary Marcus all they want, but there’s one aspect that people don’t understand.
If you have a high conviction belief that is countervailing to the mainstream, you suffer a great deal. Even the most average conversation with a “mainstream believer” can turn into a judgment fest. Sometimes people stop talking to you mid-conversation. Investors quietly remove you from their lead lists. Candidates watch your talks and go dark on you. People with no technical expertise lecture at you.
Yet, inevitably, a fraction of such people carry forward. They don’t shut up. And they are the spoon that stirs the pot of science.
It’s totally normal and inevitable that people start to take victory laps at even the smallest indication in such situations. It doesn’t mean they’re right. It is just not something worth criticizing.
No. There is a difference between being a contrarian who is making a bet and acting on their different viewpoint, and someone who is simply a debbie downer and wants to say something negative all the time with no skin in the game. Marcus is the latter.
It's easy to be a faux contrarian that just always says we're in a bubble or x is overhyped. Everyone knows that, it's the nature of markets and not an insight. The only value is having some actual insight into where and how things stop and where there is some upside. Otherwise you're just a jealous loser.