Probably worth noting that the new-ish Mozilla CEO, Anthony Enzor-DeMeo, is clearly an AI booster having talked about wanting to make Firefox into a “modern AI browser”. So, I don’t doubt that Anthropic and Mozilla saw an opportunity to make a good bit of copy.
I think this has been pushed too hard, along with general exhaustion at people insisting that AI is eating everything and the moon these claims are getting kind of farcical.
Are LLMs useful to find bugs, maybe? Reading the system card, I guess if you run the source code through the model a 10,000 times, some useful stuff falls out. Is this worth it? I have no idea anymore.
Hackernews has also been completely co-opted by boosters.
So much that i don't really visit anymore after 15 years of use.
It's a bizarre situation with billions in marketing and PR, astroturfing and torrents of fake news with streams of comments beneath them with zero skepticism and an almost horrifying worship of these billion dollar companies.
Something completely flipped here at some point, i don't know if it's because YC is also heavily pro these companies, and embedded with them, requiring YC applicants to slop code their way in, then cheering about it.
Either way it's incredibly sad and remind me of the worst casino economy, nft's, crypto, web3 while there's actually an interesting core, regex on steroids with planning aspects, but it's constantly oversold.
I say that as a daily user of Claude Max for over a year.
I haven't been able to find any communities with as high of a signal-to-noise ratio and breadth of experiences as HN, especially not public ones that one can stumble their way into without knowing a guy / joining a clique
> communities with as high of a signal-to-noise ratio and breadth of experiences as HN, especially not public ones that one can stumble their way into without knowing a guy / joining a clique
If this is such a low bar, then how come there's only HN? Can you name another? 10? 100? Because I can't.
I think the fact that AI is finally at a point where it seems to be more useful that annoying, it's easy to be overly optimistic. I've only been using Claude for a few months (I did try 20x, but fell back to 5x), and it's genuinely been a productivity multiplier. That said, the way I've worked with it is very different than me coding on my own... I spend way more time planning, there's a lot more documentation and testing that is part of the output and even then I still find a lot of issues.
I'm also mournful for those just starting out, that may lean so much on these tools that they may never have true proficiency to be able to spot issues with fitness and quality. I see people running half a dozen or more agents and know there is no way they're doping any kind of meaningful QA/QC on that output.
I've noticed a lot of astroturfing lately. It really bothers me, because it was kind of my last bastion of sanity for online tech discourse. Every forum I've used is now full of marketing and dishonesty by bots, paid shills, and bad actors.
There was a double fronted marketing push by both organizations. That much is true and this makes me more skeptical of the message and how exactly it was framed.
If we just stick with c/c++ systems, pretty much every big enough project has a backlog of thousands of these things. Either simple like compiler warnings for uninitialized values or fancier tool verified off-by-one write errors that aren’t exploitable in practice. There are many real bad things in there, but they’re hidden in the backlog waiting for someone to triage them all.
Most orgs just look at that backlog and just accept it. It takes a pretty big $$$ investment to solve.
I would like to see someone do a big deep dive in the coming weeks.
I think one thing we'll see is that "sophisticated" multi-step exploit chains will become the domain of script kiddies. They often already were, malware vendors often pre-packaged software that exploited several vulnerabilities in a row, but I expect that LLMs will make the "Attack Complexity" metric in CVSS even more useless than it already is.
Feel like LLMs main sue in these situations would be to work through these essentially nothing-burger issues? If they're essentially just time consuming to solve, rather than problematic, they should be fairly trivial for them to hopefully solve reliably enough right? I'm very doubtful on AI for actual issues a lot of times, but in my experience, it rarely finds bigger issues from scratch without a lot of extra context such as some hints towards what and where the issue is, and essentially full context explaining any relevant parts to it. However I do find that it often find minor issues when the context is small and contained, or as mentioned when it knows what the issue is, and the solution is simple.
I'm sure there's already plenty of work towards these things, but do bigger code bases completely shut out AI right now, due to the extreme amount of unsolicited PRs they get from AIs? I'd imagine if they were coordinated and structured properly on these things, they'd be more likely to be seen as an acceptable thing? I'm just spitballing, never worked on any real open source project, especially one where there's thousands if not millions of users and several issues every day, so my view on AI usage in these are mostly just from some instances where they ban all AI PRs and stuff like that because they are often really bad.
Why people publish AI written articles? If I would like to read AI I can just prompt it myself, and when I read something on someone blog I expect that I will read thoughts of this particular human being...
While the text seems to be at least AI-supported, I think the research is still interesting. Whether that was done mostly by the author or an AI still, does not change much to me at least.
I'd appreciate some sort of disclaimer at the start of each article whether it's AI written/assisted or not. But I guess authors understand that it will diminish the perceived value of their work/part.
I agree. Even if it is a little pain to read, it's still an information worth knowing and an actual humans opinion(at least I hope). There's no reason to be skeptical if it isn't a famous news site or something.
I have a friend that literally takes topics he wants/needs to know more about... has AI generate a discussion format conversation as a deep dive explanation on multiple PoV, then outputs it through TTS tooling so he can listen to it while traveling.
It's kind of crazy... but I can kind of see the appeal of getting just what you are looking for on something.
That said, actually churning that out for other people to consume feels very wrong... I absolutely hate the slop generated content on YouTube. I want a lot of historical content, so don't always mind it when it's genuine/factual... but when there are obvious content errors, it becomes more annoying than anything.
The author seems to believe that dereferencing a null pointer is safe. DoS attacks aside dereferencing a null pointer in C++ is undefined behavior so you never know what could happen. It could easily result in bypassing seemingly unrelated security checks or any other behaviour. To know it wasn't exploitable you would need to check the compiled output of every compiler and set of flags used to compile Firefox.
One think to keep in mind is that firefix is probably a pretty hard target. Everyone wants to try and hack a web browser. One assumes the low hanging fruit is mostly gone.
I think the fact this is even a conversation is pretty impressive.
Probably you're right, but given the browser usage-distribution, I reckon most hackers wouldn't care about firefox at this point and solely concentrate on chrome. I reckon firefox users are on average, more tech savvy and given a hack, would be able to help themselves/find out about the hack quicker than the average chrome user.
I think a certain level of hype is warranted for a model that can autonomously discover complex 27-year-old 0-days in OpenBSD for $20K[0]. We don't yet know what this does to the balance of attack/defense in OSS security, and we cannot know until the capability is widespread. My most hopeful guess is that it looks heavily in favor of attackers in the first 6-12 months while the oldest 0-days are still waiting to be discovered, before tipping in favor of defenders as the price goes down for Mythos-level models and the practice of using them for vulnerability review becomes widespread.
The absolute best case is at we end up with similar situation to modern cryptography, which is clearly in favor of defenders. One can imagine a world where a defender can run a codebase review for $X compute and patch all the low-hanging fruit, to the point where anything that remains for an attacker would cost $X*100000 (or some other large multiplier) to discover.
> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT‑2 along with sampling code .
Can IDE's be configured so that it won't allow to save the file changes if it contains the usual suspects; buffer overflows and what not. LLM would scan it and deny write operation.
Like the Black formatter for Python code on VSCode, that runs before hitting CTRL+S.
There's a pretty big problem here, which is that all of the security bugs (the serious ones) are embargoed. So going off of public info is not really useful
"The Firefox 150 data suggests a tool that is genuinely useful for defensive security work, especially at scale, but the public record does not justify the strongest claims people want to make from it. The headline number is impressive, yet it bundles together bugs of very different significance and does not publicly resolve into a clean accounting."
I mean: Obviously. Does not matter how good or bad a product is, the current meta is to over-hype it in order to achieve maximum "news-penetration".
Anthropic seems to have sth. "real". However, Since there is no way for outsiders to calculate real metrics like false-positive rate, cost (tokens, Dev hours for setup and review, ...)/ Issue found, ... there is no real way to put any scale on the hype-graph.
For crying out loud, why are we discussing and paying attention to articles and claims about a product that doesn't even exist yet?!
If this isn't a sign of a bubble, where marketing is more important than the actual product, I don't know what is. This industry has completely lost the plot.
Probably worth noting that the new-ish Mozilla CEO, Anthony Enzor-DeMeo, is clearly an AI booster having talked about wanting to make Firefox into a “modern AI browser”. So, I don’t doubt that Anthropic and Mozilla saw an opportunity to make a good bit of copy.
I think this has been pushed too hard, along with general exhaustion at people insisting that AI is eating everything and the moon these claims are getting kind of farcical.
Are LLMs useful to find bugs, maybe? Reading the system card, I guess if you run the source code through the model a 10,000 times, some useful stuff falls out. Is this worth it? I have no idea anymore.
> I guess if you run the source code through the model a 10,000 times, some useful stuff falls out.
But you might also get a lot of non-useful stuff which you'll need to sort out.
> new-ish Mozilla CEO, Anthony Enzor-DeMeo, is clearly an AI booster having talked about wanting to make Firefox into a “modern AI browser”
Ah... That's why they put free vpn into v150 - more human behaviors for training :))
Hackernews has also been completely co-opted by boosters.
So much that i don't really visit anymore after 15 years of use.
It's a bizarre situation with billions in marketing and PR, astroturfing and torrents of fake news with streams of comments beneath them with zero skepticism and an almost horrifying worship of these billion dollar companies.
Something completely flipped here at some point, i don't know if it's because YC is also heavily pro these companies, and embedded with them, requiring YC applicants to slop code their way in, then cheering about it.
Either way it's incredibly sad and remind me of the worst casino economy, nft's, crypto, web3 while there's actually an interesting core, regex on steroids with planning aspects, but it's constantly oversold.
I say that as a daily user of Claude Max for over a year.
I haven't been able to find any communities with as high of a signal-to-noise ratio and breadth of experiences as HN, especially not public ones that one can stumble their way into without knowing a guy / joining a clique
Heh that’s a very low bar though
> Heh that’s a very low bar though
This is a low bar: ?
> communities with as high of a signal-to-noise ratio and breadth of experiences as HN, especially not public ones that one can stumble their way into without knowing a guy / joining a clique
If this is such a low bar, then how come there's only HN? Can you name another? 10? 100? Because I can't.
HN is a tech incubator's news blog. OpenAI literally got its start as a YCombinator project as part of YC Research.
Sam Altman of OpenAI was the president of YC for years.
Like, what did you expect mon ami?
I think the fact that AI is finally at a point where it seems to be more useful that annoying, it's easy to be overly optimistic. I've only been using Claude for a few months (I did try 20x, but fell back to 5x), and it's genuinely been a productivity multiplier. That said, the way I've worked with it is very different than me coding on my own... I spend way more time planning, there's a lot more documentation and testing that is part of the output and even then I still find a lot of issues.
I'm also mournful for those just starting out, that may lean so much on these tools that they may never have true proficiency to be able to spot issues with fitness and quality. I see people running half a dozen or more agents and know there is no way they're doping any kind of meaningful QA/QC on that output.
I've noticed a lot of astroturfing lately. It really bothers me, because it was kind of my last bastion of sanity for online tech discourse. Every forum I've used is now full of marketing and dishonesty by bots, paid shills, and bad actors.
Funny because that is exactly what Edge calls itself...
There was a double fronted marketing push by both organizations. That much is true and this makes me more skeptical of the message and how exactly it was framed.
If we just stick with c/c++ systems, pretty much every big enough project has a backlog of thousands of these things. Either simple like compiler warnings for uninitialized values or fancier tool verified off-by-one write errors that aren’t exploitable in practice. There are many real bad things in there, but they’re hidden in the backlog waiting for someone to triage them all.
Most orgs just look at that backlog and just accept it. It takes a pretty big $$$ investment to solve.
I would like to see someone do a big deep dive in the coming weeks.
Globally agreed excepted for the "harmless" bit. Hackers are good these days, and these apparently innocuous bugs can be exploited in creative ways
I think one thing we'll see is that "sophisticated" multi-step exploit chains will become the domain of script kiddies. They often already were, malware vendors often pre-packaged software that exploited several vulnerabilities in a row, but I expect that LLMs will make the "Attack Complexity" metric in CVSS even more useless than it already is.
Feel like LLMs main sue in these situations would be to work through these essentially nothing-burger issues? If they're essentially just time consuming to solve, rather than problematic, they should be fairly trivial for them to hopefully solve reliably enough right? I'm very doubtful on AI for actual issues a lot of times, but in my experience, it rarely finds bigger issues from scratch without a lot of extra context such as some hints towards what and where the issue is, and essentially full context explaining any relevant parts to it. However I do find that it often find minor issues when the context is small and contained, or as mentioned when it knows what the issue is, and the solution is simple.
I'm sure there's already plenty of work towards these things, but do bigger code bases completely shut out AI right now, due to the extreme amount of unsolicited PRs they get from AIs? I'd imagine if they were coordinated and structured properly on these things, they'd be more likely to be seen as an acceptable thing? I'm just spitballing, never worked on any real open source project, especially one where there's thousands if not millions of users and several issues every day, so my view on AI usage in these are mostly just from some instances where they ban all AI PRs and stuff like that because they are often really bad.
Why people publish AI written articles? If I would like to read AI I can just prompt it myself, and when I read something on someone blog I expect that I will read thoughts of this particular human being...
While the text seems to be at least AI-supported, I think the research is still interesting. Whether that was done mostly by the author or an AI still, does not change much to me at least.
I'd appreciate some sort of disclaimer at the start of each article whether it's AI written/assisted or not. But I guess authors understand that it will diminish the perceived value of their work/part.
I agree. Even if it is a little pain to read, it's still an information worth knowing and an actual humans opinion(at least I hope). There's no reason to be skeptical if it isn't a famous news site or something.
I have a friend that literally takes topics he wants/needs to know more about... has AI generate a discussion format conversation as a deep dive explanation on multiple PoV, then outputs it through TTS tooling so he can listen to it while traveling.
It's kind of crazy... but I can kind of see the appeal of getting just what you are looking for on something.
That said, actually churning that out for other people to consume feels very wrong... I absolutely hate the slop generated content on YouTube. I want a lot of historical content, so don't always mind it when it's genuine/factual... but when there are obvious content errors, it becomes more annoying than anything.
You have no idea the territory or how shallow the llms are going
Thats the thing. You didn't prompt it. Someone is saying you should read this. Otherwise you never would have prompted it
This article felt really informative at first but sone point it was like reading an LLM getting stuck in a circle
It certainly promised more in the beginning than it actually delivered.
It all starts to feel like letting those “before the youtube video” ads run too long without skipping.
A hook you know people would like to know the answer to… followed by utter horseshit.
The author seems to believe that dereferencing a null pointer is safe. DoS attacks aside dereferencing a null pointer in C++ is undefined behavior so you never know what could happen. It could easily result in bypassing seemingly unrelated security checks or any other behaviour. To know it wasn't exploitable you would need to check the compiled output of every compiler and set of flags used to compile Firefox.
It’s just marketing. Remember when OpenAI said GPT-2 was too dangerous to release?
IIRC Mozilla usually categorize internally-found bugs into a few large CVE IDs, grouped by severity, with around ten or so bugs in each. Every advisory gets several CVEs of this kind, for example, <https://www.mozilla.org/en-US/security/advisories/mfsa2026-2...>, <https://www.mozilla.org/en-US/security/advisories/mfsa2026-1...>, <https://www.mozilla.org/en-US/security/advisories/mfsa2026-0...>, etc.
One think to keep in mind is that firefix is probably a pretty hard target. Everyone wants to try and hack a web browser. One assumes the low hanging fruit is mostly gone.
I think the fact this is even a conversation is pretty impressive.
Probably you're right, but given the browser usage-distribution, I reckon most hackers wouldn't care about firefox at this point and solely concentrate on chrome. I reckon firefox users are on average, more tech savvy and given a hack, would be able to help themselves/find out about the hack quicker than the average chrome user.
Whatever the capabilities, there’s always a little hype, or at least the risk won’t be as great as thought:
> Due to our concerns about malicious applications of the technology, we are not releasing the trained model.
That was for GPT-2 https://openai.com/index/better-language-models/
I think a certain level of hype is warranted for a model that can autonomously discover complex 27-year-old 0-days in OpenBSD for $20K[0]. We don't yet know what this does to the balance of attack/defense in OSS security, and we cannot know until the capability is widespread. My most hopeful guess is that it looks heavily in favor of attackers in the first 6-12 months while the oldest 0-days are still waiting to be discovered, before tipping in favor of defenders as the price goes down for Mythos-level models and the practice of using them for vulnerability review becomes widespread.
The absolute best case is at we end up with similar situation to modern cryptography, which is clearly in favor of defenders. One can imagine a world where a defender can run a codebase review for $X compute and patch all the low-hanging fruit, to the point where anything that remains for an attacker would cost $X*100000 (or some other large multiplier) to discover.
[0] https://red.anthropic.com/2026/mythos-preview/
In the same article you linked:
> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT‑2 along with sampling code .
7 years later, these concerns seem pretty legit.
Can IDE's be configured so that it won't allow to save the file changes if it contains the usual suspects; buffer overflows and what not. LLM would scan it and deny write operation.
Like the Black formatter for Python code on VSCode, that runs before hitting CTRL+S.
You can't just use a linter to fix buffer overflows, or people would have done it already.
There's a pretty big problem here, which is that all of the security bugs (the serious ones) are embargoed. So going off of public info is not really useful
> Conclusion
"The Firefox 150 data suggests a tool that is genuinely useful for defensive security work, especially at scale, but the public record does not justify the strongest claims people want to make from it. The headline number is impressive, yet it bundles together bugs of very different significance and does not publicly resolve into a clean accounting."
I mean: Obviously. Does not matter how good or bad a product is, the current meta is to over-hype it in order to achieve maximum "news-penetration". Anthropic seems to have sth. "real". However, Since there is no way for outsiders to calculate real metrics like false-positive rate, cost (tokens, Dev hours for setup and review, ...)/ Issue found, ... there is no real way to put any scale on the hype-graph.
For crying out loud, why are we discussing and paying attention to articles and claims about a product that doesn't even exist yet?!
If this isn't a sign of a bubble, where marketing is more important than the actual product, I don't know what is. This industry has completely lost the plot.