I've been wondering about this failed Apple Intelligence project, but the more I think of it, Apple can afford to sit and wait. In 5 years we're going to have Opus 4.6-level performance on-device, and Apple is the only company that stands to benefit from it. Nobody wants to be sending EVERY request to someone else's cloud server.
I think there's a lot of false assumptions in that assertion:
- that a bunch of users won't jump ship if Apple stagnates for 5 years
- that a product based on a model with Q12026 SoTA performance would be competitive with products using 2031's models.
- that just having access to good (by 2025/2026 standards) models is the big thing that Apple needs in order for Apple Intelligence to finally be useful.
On that last point, I think the OS/app-level features are almost more important than the model itself. If the model can't _do_ anything, it doesn't really matter how intelligent it is. If Apple sits on their laurels for 5 years, would their OS, built-in apps, and 3rd-party apps have all the hooks needed for a useful AI product?
> - that a bunch of users won't jump ship if Apple stagnates for 5 years
This is the most unlikely proposition, given how Apple has managed to be a decade behind and still very profitable.
Besides that, having a Claw-like AI with full access to your phone is surely a recipe for disaster. IMO Apple is being justifiably cautious in staying an spectator, looking busy, and waiting to make a deal with the winner of the chatbot wars.
Assuming the rate of progress on AI stays the same:
1/ No, you don't get Opus 4.6 level on devices with 12Gb of RAM, 7B quantised models just don't get that good. Still quite good mind you, and I believe that the biggest advance to come from mobile AI would be apps providing tools and the device providing a discovery service (see Android's AppFunctions, if it was ever documented well): output quality doesn't matter on device, really efficient and good tool calling is a game changer.
2/ Opus 4.6 is now Opus 4.6+5years and has new capabilities that make people want to keep sending everything to someone else's cloud server instead of burning their battery life
I'll eat my left nut if Tim "I'd rather die than give good amounts of RAM" Cook bumps the top end iPhone's RAM any higher than 16GB by 2035, especially with the current shortages. They already use relatively cheap LPDDR5X-9600 RAM in there, and are being slowly bumped off order lists on high end fabs to make room for AI hardware. Notwithstanding the fact that there's no hardware improvement in the upcoming years that either makes RAM ultra fast, or with higher capacities easily.
A claim like that is at best naive, and in any case ridiculous
Have you tried running a reasonably sized model locally? You need minimum 24GB VRAM to load up a model. 32GB to be safe, and this isnt even frontier, but bare minimum.
A good analogy would be streaming. To get good quality, sure, you can store the video file but it is going to take up space. For videos, these are 2-4GB (lets say) and streaming will always be easier and better.
For models, we're looking at 100s of GB worth of model params. There's no way we can make it into, say, 1GB without loss in quality.
So nope, beyond minimal classification and such, on-device isnt happening.
--
EDIT:
> Nobody wants to be sending EVERY request to someone else's cloud server.
We do this already with streaming. You watch YouTube that is hosting videos on the "cloud". For latest MKBHD video, I dont care about having that locally (for the most part). I just wanna watch the video and be done with it.
Same with LLMs. If LLMs are here to stay, most people would wanna use the latest / greatest models.
---
EDIT-EDIT:
If you response is Apple will figure it out somehow. Nope, Apple is sitting out the AI race. So it has no technology. It has nothing. It has access to whatever open source is available or something they can license from rest. So nope, Apple isnt pushing the limits. They are watching the world move beyond them.
I think this is very pessimistic. Yes, big models are "smarter" and have more inherent knowledge but I'd bet you a coffee that what 99% of people want to do with Siri isn't "Write me an essay on the history of textiles" or "Vibe code me a SPA", rather it's "Send Mom the pictures I took of the kids yesterday" and "Hey, play that Deadmau5 album that came out a couple years back" which is more about tool calls than having wikipedia-level knowledge built in to the model.
> Hey, play that Deadmau5 album that came out a couple years back
It could work for Deadmau5 because it is probably popular enough to be part of the model. How about "Hey, play that $regional_artist's cover of Deadmau5" and the model needs to know about "regional_artist", the concept of "cover", where those remixes might be (youtube? soundcloud? some other place).
All of a sudden, it all breaks down. So it'll work for "turn off porch lights", but not for "turn off the lights that's in the front of the house"
As long as it can run tool calls it won't "break down," not sure why you think the LLM would be searching within its own training data rather than calling the Spotify API or MCP to access that specific artist and search through for the song id of the cover.
Have you ran models locally, especially on the phone? I have, and there are even apps like Google AI Edge Gallery that runs Gemma for you. It works perfectly fine for use cases like summarizing emails and such, you don't really need the latest and greatest (ie biggest) models for tasks like these, in much the same way more people do not need the latest and greatest phone or laptop for their use cases.
And anyway, you already see models like Qwen 3.5 9B and 4B beating 30B and 80B parameter models, which can already run on phones today, especially with quantization.
I'm going by what features Apple advertisement showed in the iPhone 16 ad. Take a phone out, and point at a restuarant and ask it to a) analyze the video/image b) understand what's going on
Or pull out the phone and ask "Who's the person I met on X day ..".
>> So nope, beyond minimal classification and such, on-device isnt happening.
This is a paradox right? Handset makers want less handset storage so they can get users to buy more of their proprietary cloud storage while at the same time wanting them to use their AI more frequently on their handsets.
It will be interesting which direction they decide to go. Finding a phone in the last few years with more than 256gb storage is not only expensive AF, its become more of a rarity than commonplace. Backtracking on this model in order to simply get AI models on board would be a huge paradigm shift.
> You need minimum 24GB VRAM to load up a model. 32GB to be safe, and this isnt even frontier, but bare minimum.
Indeed.
But they said 5 years. That's certainly plausible for high-end mobile devices in Jan 2031.
I have high uncertainty on if distillation will get Opus 4.6-level performance into that RAM envelope, but something interesting on device even if not that specifically, is certainly within the realm of plausibility.
Not convinced Apple gets any bonus points in this scenario, though.
I think there's also laws of physics based on the current architecture. Its like saying looking at a 10GB video file and saying - it has to compress to 500MB right? I mean, it has to - right?
Unless we invent a completely NEW way of doing videos, there's no way you can get that kind of efficiency. If tomorrow we're using quantum pixels (or something), sure 500MB is good enough but not from existing.
In other words, you cannot compress a 100GB gguf file into .. 5GB.
There surely are limits, but I don't think we have a good idea of what those are, and there's nothing to indicate we're anywhere close to them. In terms of raw facts, you can look at the information content and know that you need at least that many bits to represent that knowledge in the model. Intelligence/reasoning is a lot less clear.
100GB to 5GB would be 20x. Video has seen an improvement of that magnitude in the days since MPEG-1.
It's interesting to consider that improvements in video codecs have come from both research and massively increased computing power, basically trading space for computation. LLMs are mostly constrained by memory bandwidth, so if there was some equivalent technique to trade space for computation in LLM inference, that would be a nice win.
If you have good performance storage, you don't need to keep all your params in VRAM. The big datacenter-scale providers do it for peak performance/throughput, but locally you're better off (at least for the largest models) letting them sit on storage and accessing them on demand.
No, but they are in the "Device that runs apps" business right?
Just like they're looking to corner the "Device that runs models locally" business by focusing on onboard inference.
Gains in model performance isn't exactly cheap, and once one frontier model figures it out, the rest seem to copy it quick. Let them figure out what works and what doesnt, then put the "Apple" touch on it, all while putting your devices in everyone's hands. That's been their business model for years.
For Private Cloud Compute specifically, the system is described as underpowered and perhaps more trouble than it’s worth. Updating the software is apparently trickier and takes time, and more fundamentally the chips (believed to comprise right now of modified M2 Ultra processors) are not powerful enough to run the latest frontier models like Gemini, which the new Siri will be based on.
> M2 Ultra processors ... are not powerful enough to run the latest frontier models
The local AI community would strongly disagree with that assessment. They may not be able to run them with low latency for interactive use and this is most likely the real blocker for them, but they will have strong compute per watt compared to nVidia GPU's.
You cropped the part of the quote that is relevant:
> like Gemini, which the new Siri will be based on.
The local AI community isn't evaluating the internal Gemini models. Apple's Private Compute hardware is specifically competing against Google's TPU hardware, which is a foregone conclusion if you've seen the inference economics. The money and electricity wasted on Mac inference at that scale isn't even attractive to Apple.
The insistence/assumption that llm models will consistently get better, smaller, and cheaper is so annoying. These things fundamentally require lots of data and lots of processing power. Moore's Law is dead; devices aren't getting exponentially faster anymore. RAM and SSDs are getting more expensive (thanks to this insane bubble).
> RAM and SSDs are getting more expensive (thanks to this insane bubble).
That's not a matter of Moore's Law failing, but short-term capacity constraints being hit. It's actually what you want if Moore's Law is to keep going. It's a blessing in disguise for the industry as a whole.
Computing power still has practically flatlined. Memory density is decelerating in its improvement. My point still stands despite the temporary pricing situation.
Single-threaded compute, maybe - but that's increasingly a niche. Highly parallel workloads are still going strong in the latest device nodes, and power use for any given workload is decreasing significantly.
That's why I use an iPhone. I don't need and I don't want any "AI" in my phone. The claims that people want it comes from CEOs, marketers and influencers of GenAI companies, not from users.
Streaming video is almost exclusively pull. The only data you're sending up to the server is what you're watching, when you seek, pause, etc.
Useful LLM usage involves pushing a lot of private data into them. There's a pretty big difference sending up some metadata about your viewing of an MKBHD video, and asking an LLM to read a text message talking about your STD test results to decide whether it merits a priority notification. A lot of people will not be comfortable with sending the latter off to The Cloud.
It's not limited to just the mobile device. You could have a MacBook/mini/studio that is part of your local "cluster" and the inference runs across all of them and optimized based on power source.
The rest of the FAANG has invested very heavily in cloud while Apple seems to be a laggard. GCP, AWS and Azure are all publicly available products, and cloud at Netflix, Meta seems very mature for a private offering.
This is not a huge disadvantage in my opinion. Let the rest of big tech fight each other to death over cloud, while controlling a very profitable differentiated offering (devices+services). Apple keeps the M series HW out of data centers, even though it presents some very attractive performance/w and per-core numbers.
I think you're correct on it not being a disadvantage. Apple's competitors are the Android OEMs, Microsoft, and Dell. Apple Intelligence is a failure only in the sense that we hold Apple to a higher standard. Can anyone argue that Apple's AI implementation is more flawed than Microsoft? I don't think so.
Being able to search photos with queries like "show me photos of me and teeray" is pretty useful.
What I really want is my phone to transcribe all of my phone calls to a Notes document. Since it isn't recording an audio conversation, I don't think the consent laws come into play.
A decade ago this didn't require LLMs and cutting edge hardware and a trillion dollars of GPUs. This was a Facebook feature in like 2012.
>What I really want is my phone to transcribe all of my phone calls to a Notes document
This has been doable for decades. Why haven't you done it? My Pixel phones did this with voicemail before LLMs.
Windows Vista shipped with full featured dictation functionality, and it works better than you would expect, all local, all using classical algorithms, all evaluated cheaply. If it wasn't accurate enough, Dragon speech to text tools were gold standard for most of modern computing history, and greatly surpassed the accuracy of that built in system.
BTW, you can, on any Windows machine right now, access that built in voice recognition, and with a "Constrained vocabulary", say if you only want a few specific voice commands, it gets near perfect accuracy constantly. You have to search for old documentation now because Microsoft wants to hide that you don't need an internet connection or an Azure account and monthly bill to ship accurate voice recognition with your app. It's trivial to use, from both C++ and C#, and anything else that allows you to invoke native code, and the workflow is easy enough to understand. I built an app to utilize it instead of buying one of those $10 "Voice control your game" apps to add voice control to ARMA, and it was easier to implement the voice recognition than it was to copy and paste native code invocations for the Win32 api to inject keystrokes. I don't even write C# code in general.
There's tons of documentation about "Grammar" and configuration but the default configuration IIRC is to just turn speech input into text, and do so with at least 85% accuracy, even without the user actually training the recognizer to their voice. If you build context specific grammars or a hierarchical grammar to support a real UX that isn't just hoping some code knows how to interpret raw speech you will get dramatically better recognition performance.
This is IMO a frequent pattern. Time and time again the people who keep saying "I want LLMs to do X" don't seem to be aware that "X" was a robust and mature area of research decades ago! They don't seem to be aware that you could already do X and even buy ready to go software for that purpose! Often enough the LLM version is an outright regression in functionality, as things that were doable with a single microchip in 1960 now require an internet connection.
>Since it isn't recording an audio conversation,
So to be clear, you want this functionality explicitly to bypass law? Federally and in 39ish states, you only need your own consent anyway.
I do love that feature. As a parent I'm part of multiple group chat for different things. And it's nice to have a single summary instead of reading 50+ unread messages.
I find there's more times that it summarizes a single message to something more convoluted than times it catches me up.
It would at least be nice if they could do some basic attempts to detect scam texts, because I frequently see AI message summaries about supposed $2000 purchases made on my Amazon account, usually worded more eloquently than the original message.
I often use Apple Intelligence to proofread emails before I send them. It's nice that it runs on device. I don't think I ever had a use case where it would have to use their Private Cloud though.
I'm a complete Apple ecosystem user-- I have a Mac, an iPhone, an Apple Watch, Apple earbuds, and an Apple TV, and I also pay reasonably close attention to their announcements and developments-- and I couldn't tell you a single Apple Intelligence feature. Nor do I ever use Siri except for setting kitchen timers.
What do people even expect from these intelligence services? Apple is always said to have failed, yet I've seen nothing in Windows that I'd actually want to use WRT to intelligence services.
Siri being better at free form requests for actions and doing internet/knowledge searches is about all I can think of. But also, I use Kagi for that, and unless Siri has a pluggable backend for search I'm not sure being forced to use only Apple's search, if it ever exists, is a great design.
I was wondering the same thing. I turned notification summaries off as they were less than useful, and I don't think I've stumbled across any other Apple Intelligence features apart from the laughable Image Playground or whatever it's called.
I cringe whenever I see the Image Playground icon on my MacBook.
It somehow looks worse than most scammy image generation apps you see on half-page search ads on the App Store. I have no idea how Apple willingly released it like that.
It was updated on my iPhone to a bland, forgettable abstract icon that’s still fairly mediocre but no longer an ongoing embarrassment for their corporate brand standards.
I mean I think a lot of it is that they're not _really_ forcing it upon people. I think I've declined it maybe twice over the last two years. Meanwhile, Google are trying to crowbar bloody gemini in _everywhere_, and I gather Microsoft is doing ditto.
Yeah but this is how Apple has always done infrastructure/services. Their internal software teams are a mess. They constantly reinvent the wheel poorly, and then they charge a premium for exclusive access. Is anyone surprised by this?
What's insane is that the market / users doesn't care, they're making more than ever... It's quite sad to see that vision pro, apple Intelligence and liquid glass were all failures and no one cared... I hope android makes a comeback against Apple in the US so they're forced to innovate.
I don’t see Android making big inroads until there’s more of a presence from Android manufacturers that fill Apple’s niche in smartphones and tablets.
Samsung desperately wants to be this but misses the part where iPhones don’t come with third party junkware even if they’re entry level models and don’t allow carrier junkware either. Google could be it but they’re too married to midrange hardware and underwhelming physical designs.
All it would take is for a manufacturer to commit to their whole lineup being built with reasonably capable hardware (no ancient or weak SoCs as seen in budget Android devices), to completely jettison third party junkware, and have top end flagships with hardware that actually matches that description, but none thus far have managed this.
I don't think the average consumer is thinking about junkware nor physical design, it's just most people have iPhones especially in tech / young adults and thus more want to be on iPhone to share messages, airdrop, airpod support etcetc. They've created a network effect.
> I don't think the average consumer is thinking about junkware nor physical design
Probably not, but a zero junkware/zero carrier meddling policy is a major contributor to the brand's premium image, which makes the whole lineup more desirable. The iPhone is an invariable, singular product no matter how it's obtained, even if it has different price points.
By contrast Samsung, etc undermine themselves by trying to squeeze out pennies anywhere they can. That's the behavior of a commodity, not a premium brand.
I haven't followed OnePlus closely but as I remember, when they had their first burst of popularity they were aiming to be a value play more than anything else, operating mostly in the midrange space.
What other services does Apple have that people would be paying for? The ones they have today are either iCloud storage, which does not need much compute, or merely an alibi so they can claim with an almost straight face that Apple's "Services" revenue isn't basically just the App Store 30% tax. That also explains why they are constantly shoving ads for News or Fitness our throats in the Settings app.
Taken another way given apple’s enormous market reach, this could be seen as perhaps the most solid metric of actual consumer interest in ai and features ignoring hype.
Not sure. I'm a heavy AI user at this point. Oh, also a heavy Apple user and never once used an Apple AI thing since they released them. I don't even know what they released. It is complete failure of execution on their part.
I've been wondering about this failed Apple Intelligence project, but the more I think of it, Apple can afford to sit and wait. In 5 years we're going to have Opus 4.6-level performance on-device, and Apple is the only company that stands to benefit from it. Nobody wants to be sending EVERY request to someone else's cloud server.
I think there's a lot of false assumptions in that assertion:
On that last point, I think the OS/app-level features are almost more important than the model itself. If the model can't _do_ anything, it doesn't really matter how intelligent it is. If Apple sits on their laurels for 5 years, would their OS, built-in apps, and 3rd-party apps have all the hooks needed for a useful AI product?> - that a bunch of users won't jump ship if Apple stagnates for 5 years
This is the most unlikely proposition, given how Apple has managed to be a decade behind and still very profitable.
Besides that, having a Claw-like AI with full access to your phone is surely a recipe for disaster. IMO Apple is being justifiably cautious in staying an spectator, looking busy, and waiting to make a deal with the winner of the chatbot wars.
Assuming the rate of progress on AI stays the same:
1/ No, you don't get Opus 4.6 level on devices with 12Gb of RAM, 7B quantised models just don't get that good. Still quite good mind you, and I believe that the biggest advance to come from mobile AI would be apps providing tools and the device providing a discovery service (see Android's AppFunctions, if it was ever documented well): output quality doesn't matter on device, really efficient and good tool calling is a game changer.
2/ Opus 4.6 is now Opus 4.6+5years and has new capabilities that make people want to keep sending everything to someone else's cloud server instead of burning their battery life
I think the claim is that in 5 years an iPhone will have enough ultra-fast RAM to run 300B-1T models on-device.
I'll eat my left nut if Tim "I'd rather die than give good amounts of RAM" Cook bumps the top end iPhone's RAM any higher than 16GB by 2035, especially with the current shortages. They already use relatively cheap LPDDR5X-9600 RAM in there, and are being slowly bumped off order lists on high end fabs to make room for AI hardware. Notwithstanding the fact that there's no hardware improvement in the upcoming years that either makes RAM ultra fast, or with higher capacities easily.
A claim like that is at best naive, and in any case ridiculous
It isnt speed you want. It is storage. Faster CPU doesnt mean you can store a TB model. It needs raw storage, which famously is through the roof.
So unless iPhone 20 Pro Max has 100GB of unifieid memory all of this is just pipe-dream. I mean, it wont even have 32GB of unified memory.
Have you tried running a reasonably sized model locally? You need minimum 24GB VRAM to load up a model. 32GB to be safe, and this isnt even frontier, but bare minimum.
A good analogy would be streaming. To get good quality, sure, you can store the video file but it is going to take up space. For videos, these are 2-4GB (lets say) and streaming will always be easier and better.
For models, we're looking at 100s of GB worth of model params. There's no way we can make it into, say, 1GB without loss in quality.
So nope, beyond minimal classification and such, on-device isnt happening.
--
EDIT:
> Nobody wants to be sending EVERY request to someone else's cloud server.
We do this already with streaming. You watch YouTube that is hosting videos on the "cloud". For latest MKBHD video, I dont care about having that locally (for the most part). I just wanna watch the video and be done with it.
Same with LLMs. If LLMs are here to stay, most people would wanna use the latest / greatest models.
---
EDIT-EDIT:
If you response is Apple will figure it out somehow. Nope, Apple is sitting out the AI race. So it has no technology. It has nothing. It has access to whatever open source is available or something they can license from rest. So nope, Apple isnt pushing the limits. They are watching the world move beyond them.
I think this is very pessimistic. Yes, big models are "smarter" and have more inherent knowledge but I'd bet you a coffee that what 99% of people want to do with Siri isn't "Write me an essay on the history of textiles" or "Vibe code me a SPA", rather it's "Send Mom the pictures I took of the kids yesterday" and "Hey, play that Deadmau5 album that came out a couple years back" which is more about tool calls than having wikipedia-level knowledge built in to the model.
> Hey, play that Deadmau5 album that came out a couple years back
It could work for Deadmau5 because it is probably popular enough to be part of the model. How about "Hey, play that $regional_artist's cover of Deadmau5" and the model needs to know about "regional_artist", the concept of "cover", where those remixes might be (youtube? soundcloud? some other place).
All of a sudden, it all breaks down. So it'll work for "turn off porch lights", but not for "turn off the lights that's in the front of the house"
As long as it can run tool calls it won't "break down," not sure why you think the LLM would be searching within its own training data rather than calling the Spotify API or MCP to access that specific artist and search through for the song id of the cover.
*deadmau5
Have you ran models locally, especially on the phone? I have, and there are even apps like Google AI Edge Gallery that runs Gemma for you. It works perfectly fine for use cases like summarizing emails and such, you don't really need the latest and greatest (ie biggest) models for tasks like these, in much the same way more people do not need the latest and greatest phone or laptop for their use cases.
And anyway, you already see models like Qwen 3.5 9B and 4B beating 30B and 80B parameter models, which can already run on phones today, especially with quantization.
Benchmarks: https://huggingface.co/Qwen/Qwen3.5-4B
I'm going by what features Apple advertisement showed in the iPhone 16 ad. Take a phone out, and point at a restuarant and ask it to a) analyze the video/image b) understand what's going on
Or pull out the phone and ask "Who's the person I met on X day ..".
Sure, many local models can do all that today already, as they have vision and tool calling support.
>> So nope, beyond minimal classification and such, on-device isnt happening.
This is a paradox right? Handset makers want less handset storage so they can get users to buy more of their proprietary cloud storage while at the same time wanting them to use their AI more frequently on their handsets.
It will be interesting which direction they decide to go. Finding a phone in the last few years with more than 256gb storage is not only expensive AF, its become more of a rarity than commonplace. Backtracking on this model in order to simply get AI models on board would be a huge paradigm shift.
If all of the storage is used up by models, users will need to buy proprietary cloud storage for their own content.
[dead]
> You need minimum 24GB VRAM to load up a model. 32GB to be safe, and this isnt even frontier, but bare minimum.
Indeed.
But they said 5 years. That's certainly plausible for high-end mobile devices in Jan 2031.
I have high uncertainty on if distillation will get Opus 4.6-level performance into that RAM envelope, but something interesting on device even if not that specifically, is certainly within the realm of plausibility.
Not convinced Apple gets any bonus points in this scenario, though.
5 years ago , LLM was "beyond minimal conversation, intelligence isn't happening".
I'm pretty sure in five years, local LLM will be a thing.
I think there's also laws of physics based on the current architecture. Its like saying looking at a 10GB video file and saying - it has to compress to 500MB right? I mean, it has to - right?
Unless we invent a completely NEW way of doing videos, there's no way you can get that kind of efficiency. If tomorrow we're using quantum pixels (or something), sure 500MB is good enough but not from existing.
In other words, you cannot compress a 100GB gguf file into .. 5GB.
There surely are limits, but I don't think we have a good idea of what those are, and there's nothing to indicate we're anywhere close to them. In terms of raw facts, you can look at the information content and know that you need at least that many bits to represent that knowledge in the model. Intelligence/reasoning is a lot less clear.
100GB to 5GB would be 20x. Video has seen an improvement of that magnitude in the days since MPEG-1.
It's interesting to consider that improvements in video codecs have come from both research and massively increased computing power, basically trading space for computation. LLMs are mostly constrained by memory bandwidth, so if there was some equivalent technique to trade space for computation in LLM inference, that would be a nice win.
If you have good performance storage, you don't need to keep all your params in VRAM. The big datacenter-scale providers do it for peak performance/throughput, but locally you're better off (at least for the largest models) letting them sit on storage and accessing them on demand.
>Apple is sitting out the AI race
Then why does my M4 run models at TOK/s that similar priced GPUs cannot?
iPhones can run Uber app but nobody would claim Apple is in the ride sharing business.
No, but they are in the "Device that runs apps" business right? Just like they're looking to corner the "Device that runs models locally" business by focusing on onboard inference.
Gains in model performance isn't exactly cheap, and once one frontier model figures it out, the rest seem to copy it quick. Let them figure out what works and what doesnt, then put the "Apple" touch on it, all while putting your devices in everyone's hands. That's been their business model for years.
From TFA:
> M2 Ultra processors ... are not powerful enough to run the latest frontier models
The local AI community would strongly disagree with that assessment. They may not be able to run them with low latency for interactive use and this is most likely the real blocker for them, but they will have strong compute per watt compared to nVidia GPU's.
You cropped the part of the quote that is relevant:
> like Gemini, which the new Siri will be based on.
The local AI community isn't evaluating the internal Gemini models. Apple's Private Compute hardware is specifically competing against Google's TPU hardware, which is a foregone conclusion if you've seen the inference economics. The money and electricity wasted on Mac inference at that scale isn't even attractive to Apple.
The insistence/assumption that llm models will consistently get better, smaller, and cheaper is so annoying. These things fundamentally require lots of data and lots of processing power. Moore's Law is dead; devices aren't getting exponentially faster anymore. RAM and SSDs are getting more expensive (thanks to this insane bubble).
> RAM and SSDs are getting more expensive (thanks to this insane bubble).
That's not a matter of Moore's Law failing, but short-term capacity constraints being hit. It's actually what you want if Moore's Law is to keep going. It's a blessing in disguise for the industry as a whole.
Computing power still has practically flatlined. Memory density is decelerating in its improvement. My point still stands despite the temporary pricing situation.
> Computing power still has practically flatlined
Single-threaded compute, maybe - but that's increasingly a niche. Highly parallel workloads are still going strong in the latest device nodes, and power use for any given workload is decreasing significantly.
For vibe coding? Sure. For "Hey Siri, send Grandma an e-mail summarizing my schedule this afternoon."? No.
> Nope, Apple is sitting out the AI race.
That's why I use an iPhone. I don't need and I don't want any "AI" in my phone. The claims that people want it comes from CEOs, marketers and influencers of GenAI companies, not from users.
Got some bad news for you. Apple sitting out the "AI" race is just a skill issue. In fact they're buying their way back in:
https://www.reuters.com/business/google-apple-enter-into-mul...
Streaming video is almost exclusively pull. The only data you're sending up to the server is what you're watching, when you seek, pause, etc.
Useful LLM usage involves pushing a lot of private data into them. There's a pretty big difference sending up some metadata about your viewing of an MKBHD video, and asking an LLM to read a text message talking about your STD test results to decide whether it merits a priority notification. A lot of people will not be comfortable with sending the latter off to The Cloud.
If the goal was as much to establish a trademark for "Apple Intelligence" as anything else then it wasn't a failure.
> and Apple is the only company that stands to benefit from it.
And that is exactly why it won't happen (like that).
How do you do on-device inference while preserving battery life?
Using something like Taalas' hardcoded model as opposed to running one on general purpose GPUs, flexible but power-hungry.
https://www.cnx-software.com/2026/02/22/taalas-hc1-hardwired...
It's not limited to just the mobile device. You could have a MacBook/mini/studio that is part of your local "cluster" and the inference runs across all of them and optimized based on power source.
The rest of the FAANG has invested very heavily in cloud while Apple seems to be a laggard. GCP, AWS and Azure are all publicly available products, and cloud at Netflix, Meta seems very mature for a private offering.
This is not a huge disadvantage in my opinion. Let the rest of big tech fight each other to death over cloud, while controlling a very profitable differentiated offering (devices+services). Apple keeps the M series HW out of data centers, even though it presents some very attractive performance/w and per-core numbers.
I think you're correct on it not being a disadvantage. Apple's competitors are the Android OEMs, Microsoft, and Dell. Apple Intelligence is a failure only in the sense that we hold Apple to a higher standard. Can anyone argue that Apple's AI implementation is more flawed than Microsoft? I don't think so.
Just like Siri, it’s completely useless. I don’t need Apple Intelligence to summarize my text messages. I can skim them nearly as fast.
Being able to search photos with queries like "show me photos of me and teeray" is pretty useful.
What I really want is my phone to transcribe all of my phone calls to a Notes document. Since it isn't recording an audio conversation, I don't think the consent laws come into play.
A decade ago this didn't require LLMs and cutting edge hardware and a trillion dollars of GPUs. This was a Facebook feature in like 2012.
>What I really want is my phone to transcribe all of my phone calls to a Notes document
This has been doable for decades. Why haven't you done it? My Pixel phones did this with voicemail before LLMs.
Windows Vista shipped with full featured dictation functionality, and it works better than you would expect, all local, all using classical algorithms, all evaluated cheaply. If it wasn't accurate enough, Dragon speech to text tools were gold standard for most of modern computing history, and greatly surpassed the accuracy of that built in system.
BTW, you can, on any Windows machine right now, access that built in voice recognition, and with a "Constrained vocabulary", say if you only want a few specific voice commands, it gets near perfect accuracy constantly. You have to search for old documentation now because Microsoft wants to hide that you don't need an internet connection or an Azure account and monthly bill to ship accurate voice recognition with your app. It's trivial to use, from both C++ and C#, and anything else that allows you to invoke native code, and the workflow is easy enough to understand. I built an app to utilize it instead of buying one of those $10 "Voice control your game" apps to add voice control to ARMA, and it was easier to implement the voice recognition than it was to copy and paste native code invocations for the Win32 api to inject keystrokes. I don't even write C# code in general.
https://learn.microsoft.com/en-us/previous-versions/windows/...
There's tons of documentation about "Grammar" and configuration but the default configuration IIRC is to just turn speech input into text, and do so with at least 85% accuracy, even without the user actually training the recognizer to their voice. If you build context specific grammars or a hierarchical grammar to support a real UX that isn't just hoping some code knows how to interpret raw speech you will get dramatically better recognition performance.
This is IMO a frequent pattern. Time and time again the people who keep saying "I want LLMs to do X" don't seem to be aware that "X" was a robust and mature area of research decades ago! They don't seem to be aware that you could already do X and even buy ready to go software for that purpose! Often enough the LLM version is an outright regression in functionality, as things that were doable with a single microchip in 1960 now require an internet connection.
>Since it isn't recording an audio conversation,
So to be clear, you want this functionality explicitly to bypass law? Federally and in 39ish states, you only need your own consent anyway.
I do love that feature. As a parent I'm part of multiple group chat for different things. And it's nice to have a single summary instead of reading 50+ unread messages.
I find there's more times that it summarizes a single message to something more convoluted than times it catches me up.
It would at least be nice if they could do some basic attempts to detect scam texts, because I frequently see AI message summaries about supposed $2000 purchases made on my Amazon account, usually worded more eloquently than the original message.
I often use Apple Intelligence to proofread emails before I send them. It's nice that it runs on device. I don't think I ever had a use case where it would have to use their Private Cloud though.
What are these servers actually used for?
The Siri+LLM features of Apple Intelligence aren’t launched yet, and the other features like notification summaries run on-device.
Well... you can write Apple Shortcuts that send AI requests to their cloud.
The next Siri is Siri by Gemini, running on Google servers with Apple Privacy requirements. (aiui)
https://www.macrumors.com/2026/01/30/apple-explains-how-gemi...
I'm a complete Apple ecosystem user-- I have a Mac, an iPhone, an Apple Watch, Apple earbuds, and an Apple TV, and I also pay reasonably close attention to their announcements and developments-- and I couldn't tell you a single Apple Intelligence feature. Nor do I ever use Siri except for setting kitchen timers.
Just a total failure of execution.
What do people even expect from these intelligence services? Apple is always said to have failed, yet I've seen nothing in Windows that I'd actually want to use WRT to intelligence services.
Siri being better at free form requests for actions and doing internet/knowledge searches is about all I can think of. But also, I use Kagi for that, and unless Siri has a pluggable backend for search I'm not sure being forced to use only Apple's search, if it ever exists, is a great design.
The gorgeous rainbow border is one of the Apple Intelligence features, unavailable to plain Siri :/
> Nor do I ever use Siri except for setting kitchen timers.
If it even works, it fails with "something went wrong" for me 3 out of 5 times
They are supposed to run Apple Intelligence for devices too old to do it themselves.
https://security.apple.com/blog/private-cloud-compute/
I was wondering the same thing. I turned notification summaries off as they were less than useful, and I don't think I've stumbled across any other Apple Intelligence features apart from the laughable Image Playground or whatever it's called.
I cringe whenever I see the Image Playground icon on my MacBook.
It somehow looks worse than most scammy image generation apps you see on half-page search ads on the App Store. I have no idea how Apple willingly released it like that.
It was updated on my iPhone to a bland, forgettable abstract icon that’s still fairly mediocre but no longer an ongoing embarrassment for their corporate brand standards.
I mean I think a lot of it is that they're not _really_ forcing it upon people. I think I've declined it maybe twice over the last two years. Meanwhile, Google are trying to crowbar bloody gemini in _everywhere_, and I gather Microsoft is doing ditto.
Yeah but this is how Apple has always done infrastructure/services. Their internal software teams are a mess. They constantly reinvent the wheel poorly, and then they charge a premium for exclusive access. Is anyone surprised by this?
they should just scalp the RAM on ebay, that's what people actually _want_ to buy, not ai
Can't, the RAM is soldered to the motherboard
Did they not just see crazy sales on Mac Minis the second users figured out it meant they could give an AI access to blue-bubble text messages?
Imagine launching such a shitty product that AI servers are sitting unused in 2026.
What's insane is that the market / users doesn't care, they're making more than ever... It's quite sad to see that vision pro, apple Intelligence and liquid glass were all failures and no one cared... I hope android makes a comeback against Apple in the US so they're forced to innovate.
I don’t see Android making big inroads until there’s more of a presence from Android manufacturers that fill Apple’s niche in smartphones and tablets.
Samsung desperately wants to be this but misses the part where iPhones don’t come with third party junkware even if they’re entry level models and don’t allow carrier junkware either. Google could be it but they’re too married to midrange hardware and underwhelming physical designs.
All it would take is for a manufacturer to commit to their whole lineup being built with reasonably capable hardware (no ancient or weak SoCs as seen in budget Android devices), to completely jettison third party junkware, and have top end flagships with hardware that actually matches that description, but none thus far have managed this.
I don't think the average consumer is thinking about junkware nor physical design, it's just most people have iPhones especially in tech / young adults and thus more want to be on iPhone to share messages, airdrop, airpod support etcetc. They've created a network effect.
> I don't think the average consumer is thinking about junkware nor physical design
Probably not, but a zero junkware/zero carrier meddling policy is a major contributor to the brand's premium image, which makes the whole lineup more desirable. The iPhone is an invariable, singular product no matter how it's obtained, even if it has different price points.
By contrast Samsung, etc undermine themselves by trying to squeeze out pennies anywhere they can. That's the behavior of a commodity, not a premium brand.
is that not what oneplus started as?
I haven't followed OnePlus closely but as I remember, when they had their first burst of popularity they were aiming to be a value play more than anything else, operating mostly in the midrange space.
Microsoft CoPilot?
Microsoft is doing a "good" job slapping AI onto their products. Might not be the best use but I doubt they sit idle.
Their servers aren't sitting idle. Sam needs them all.
Apple will utilize them when needed and scoop up extra capacity after the bubble burst.
Did you read the article? Apple's servers are M2 Ultra class and not able to run modern models.
What does that matter? They cant be reused for other things? Nonsense.
What other services does Apple have that people would be paying for? The ones they have today are either iCloud storage, which does not need much compute, or merely an alibi so they can claim with an almost straight face that Apple's "Services" revenue isn't basically just the App Store 30% tax. That also explains why they are constantly shoving ads for News or Fitness our throats in the Settings app.
Taken another way given apple’s enormous market reach, this could be seen as perhaps the most solid metric of actual consumer interest in ai and features ignoring hype.
Not sure. I'm a heavy AI user at this point. Oh, also a heavy Apple user and never once used an Apple AI thing since they released them. I don't even know what they released. It is complete failure of execution on their part.
[dead]
[flagged]