Running background processes might motivate the use of NPU more but don't exactly feel like a pressing need. Actively listen to you 24/7 and analyze the data isn't a usecase I'm eager to explore given the lack of control we have of our own devices.
The AI Edge Gallery app on Android (which is the officially recommended way to try out Gemma on phones) uses the GPU (lacks NPU support) even on first party Pixel phones. So it's less of "they didn't want to interface with Apple's proprietary tensor blocks" and more of that they just didn't give a f in general. A truly baffling decision.
The pattern "It's not mere X — it's Y", occurs like 4 times in the text :v
The problem with the article is the complete lack of details. No benchmarks on the iPhone capable models. No details, whatsoever.
Human or LLM - the article is a whole lot of nothing.
My favorite: couldn't even prove the author is a real person. They all found no record!
I guess I found the millennial. I haven't seen that in so long!
At this point relying on their judgement is beyond folly.
https://old.reddit.com/r/ChatGPT/comments/13mft8s/apparently...
LLM output doesn't have the variety of human output, since they operate in fixed fashion - statistical inference followed by formulaic sampling.
Additionally, the statistics used by LLMs are going be be similar across different LLMs since at scale its just "the statistics of the internet".
Human output has much more variety, partly because we're individuals with our own reading/writing histories (which we're drawing upon when writing), and partly because we're not so formulaic in the way we generate. Individuals have their own writing styles and vocabulary, and one can identify specific authors to a reasonable degree of accuracy based on this.
It's a bit like detecting cheating in a chess tournament. If an unusually high percentage of a player's moves are optimal computer moves, then there is a high likelihood that they were computer generated. Computers and humans don't pick moves in the same way, and humans don't have the computational power to always find "optimal" moves.
Similarly with the "AI detectors" used to detect if kids are using AI to write their homework essays, or to detect if blog posts are AI generated ... if an unusually high percentage of words are predictable by what came before (the way LLMs work), and if those statistics match that of an LLM, then there is an extremely high chance that it was written by an LLM.
Can you ever be 100% sure? Maybe not, but in reality human written text is never going to have such statistical regularity, and such an LLM statistical signature, that an AI detector gives it more than a 10-20% confidence of being AI, so when the detector says it's 80%+ confident something was AI generated, that effectively means 100%. There is of course also content that is part human part AI (human used LLM to fix up their writing), which may score somewhere in the middle.
This is the wrong thing to look at; your chess analogy is much stronger, the detection method similar (if you can figure out a prompt that generates something close to the content, it almost certainly isn't human origin).
But to why the thing I'm quoting doesn't work: If you took, say, web comic author Darren Gav Bleuel, put him in a sci-fi mass duplication incident make 950 million of him, and had them all talking and writing all over the internet, people would very quickly learn to recognise the style, which would have very little variety because they'd all be forks of the same person.
Indeed, LLMs are very good at presenting other styles than their defaults, better at this than most humans, and what gives away LLMs is that (1) very few people bother to ask them to act other than their defaults, and (2) all the different models, being trained in similar ways on similar data with similar architectures, are inherently similar to each other.
https://github.com/blixt/pucky
It writes a single TypeScript file (I tried multiple files but embedded Gemma 4 is just not smart enough) and compiles the code with oxc.
You need to build it yourself in Xcode because this probably wouldn't survive the App Store review process. Once you run it, there are two starting points included (React Native and Three.js), the UX is a bit obscure but edge-swipe left/right to switch between views.
I think react native can be switched with swift
qwen3-coder-next uses a lot less since it seems to only activate ~3B parameters at a time.
My guess is that this is still close to tech demo, and a lot of performance is left on the table.
At a glance, I see they do gather analytics about how much the app is used (model downloads, model invocations etc) without message content, pretty much just the model used.
The funny thing is that a lot of Google's internal training content uses an imaginary product "gShoe", and discusses the privacy implications of data that such a shoe might collect :D
Apple doesn’t care about revenue from a random TODO app.
A kid playing Roblox can spend more than that in a good weekend.
Where can I get this amazing technology?
I’m sure there are things on my phone it could replace (though I struggle to think of them) but there are plenty it can’t. My black magic camera app, web browsers, local send, libby/hoopla…
I can’t really think of any apps I use every day - or every week - that an LLM would replace. I’m not coding on my smartphone and aside from that an LLM is basically a more complex, somewhat inconsistent search engine experience right now for most people. Siri didn’t replace any of my apps, for instance. Why would chatGPT?
TL;DR: what apps would an LLM replace on my iPhone?
See Anywhere and Replit. Anywhere was the #1 or #2 app and was taken off the app store entirely before being put on and then taken off again.
Last I checked, Replit hasn't received an update on the iOS app store in over two months due to reviews denying them.
But it's more likely it's just walled garden + security theatre that'll keep them from allowing outside apps.
With a canonical source of truth, and set input/output expectations, the potential blast radius is quite small.
Basically, a "toy" app to showcase where we are with coding agents on-device.
Come on folks, their IT hardware may be nice but supporting them is not worth it.
Apple is getting a base Gemini model (not a Gemma), and it will run on Apple private compute. Apple foundational models will remain the on device model
What are the possibilities of an Android or iOS device where the OS is centered around a locally running LLM with an API for accessing it from apps, along with tools the LLM can call to access data from locally running apps? What’s the equivalent of the original Mac OS?
Do apps disappear and there’s just a running dialog with the LLM generating graphical displays as needed on demand?
Threat found This web page may contain dangerous content that can provide remote access to an infected device, leak sensitive data from the device or harm the targeted device. Threat: JS/Agent.RDW trojan
I remember being excited when Apple got widgets because then I could add my 'Next Alarm time' to my home screen. Made my company work phone usable on trips.
I wonder when they are going to get NVIDIA cards or CUDA? Then they can actually run LLMs and not just trick people into buying it under the 30 year old idea of 'Unified Memory'.
They've had to be dragged kicking and screaming away from the NPU model only to admit that GPGPU tech was the right choice.
'Cool demo' -> Doesnt convert to tangible things.
Wont attempt to compete with companies better than them, but go their own route. "oh look it consumes low power!" (Things no one cared about).
They are the Nintendo of tech.
Isn't the "edge" meant to be computing near the user, but not on their devices?
In a general sense, edge just means moving the computation to the user, rather than in a central cloud (although the two aren’t mutually exclusive, eg Cloudflare Workers)
For those that have lost their marbles: sure, people use words incorrectly, but that does mean we all have to use those words incorrectly.
In compute vernacular, "edge" means it's distributed in a way that the compute is close to the user (the "user" here is the device, not a person); "on device" means the compute is on the device. They do not mean the same thing.
Can't wait until AI companies go from mimicking human thought to figuring how to licensing those thoughts. ;)
I want to test a hypothesis for "uploading" neural network knowledge to a user's brain, by a reaction-speed game.
You don't need a neural network. Traditional NLP is far better at this task. The keyword you're looking for is "phoenemizer"
I'm surprised traditional NLP being better than ML models for this task, can you point me to a benchmark analysis pointing out that non-neural Espeak-ng is better than ML models?
Also, I asked for a neural model for another reason as well, I still want semantic knowledge present, I want more than pronunciation, but before I use myself as a test subject, I want to make sure I get the proper pronunciation in case the highly speculative "uploading game" works... I don't want to early systematically mis-train myself on pronunciation...
Seems pretty good to me!
The model itself works absolutely fine, though the iPhone thermal throttles at some point which really reduces the token generation speed. When I asked it to write me a business plan for a fish farm in the Nevada desert, it slowed down after a couple thousand tokens, whereas the Pixel seems to just keep going.
It's a 100% replacement for free ChatGPT/Gemini.
Compared to the paid pro/thinking models... Gemma does have reasoning, and I have used the reasoning mode for some tax & legal/accounting advice recently as well as other misc problems. It's worked well for that, but I haven't tried any real difficult tasks. From what I've heard re. agentic coding, the open weight models are ~18-24 months behind Anthropic & Google's SOTA.
Qwen 3.5 122B-A10B should just fit into 128 GB with a Q4/5 and may be a bit smarter. There's apparently also a similar sized Gemma 4 model but they haven't released it yet, the 26B was the largest released.
You need a relatively beefy phone to run this stuff on large amounts of text, though, and you can't have every app run it because your battery wouldn't last more than an hour.
I think the real use case for apps is more like going to be something like tiny, purpose-trained models, like the 270M models Google wants people to train and use: https://developers.googleblog.com/on-device-function-calling... With these things, you can set up somewhat intelligent situational automation without having to work out logic trees and edge cases beforehand.
Never paid an LLM provider and I have no reason to ever start.
The only downside is that I suspect the Framework would be a decent bit quieter under load (not that this thing is abnormally loud). As well as you're limited to a single M.2 2230 internal SSD slot in this (I believe Micron recently launched a 4 TB model, but generally you'll max out at 2 TB without using an external enclosure).
I don't have anything against the Framework, I'm sure it's a great machine, but the Z13 is an incredible portable all-in-one device that can handle everything from general PC use to gaming to tablet/entertainment to LLMs & high perf.
[0] https://frame.work/products/desktop-diy-amd-aimax300/configu...
Disappointing if you compare it to anything else from 2026, but fairly impressive for something that can run locally at an OK speed.