That means you can use AI on a flight with no wifi. In a country with internet censorship. In a hospital where cloud services are a compliance nightmare. Or just because you'd rather not have your journal entries sitting in someone's training data.
The tech: llama.cpp for text (15-30 tok/s, any GGUF model), Stable Diffusion for images (5-10s on Snapdragon NPU), Whisper for voice, SmolVLM/Qwen3-VL for vision. Hardware-accelerated on both Android (QNN, OpenCL) and iOS (Core ML, ANE, Metal).
MIT licensed. Android APK on GitHub Releases. Build from source for iOS.
Markdown rendering would be great Letting me download a model, wait for it, then. tell me I can't run it safely in my ram feels like it could've told me at the beginning
Bar that nice! Was it annoying integrating with apples AI stuff?
Yeah it wasn't straight forward figuring out the speed + getting it to work for both Android and iOS. Fair bit if complexity. But I'm so happy I got it done.
Reminds me a lot of https://github.com/google-ai-edge/gallery which is a proof-of-concept app by Google themselves for their AI libraries. However, your app supports more and larger models without having to manually import anything, which is very useful.
I'm still working on a few things, but let me get a backlog in place and push it so people know what the roadmap looks like
Really awesome idea though. I want this to work.
Thanks for spotting and reporting this.
Decently performing models for day to day use cases use at least 6GB VRAM each, though, and even then they're not coming very close to what the cheapest AI websites offer.
So the lastest releases is at https://github.com/alichherawalla/off-grid-mobile/releases/l...
And the clone would be: git clone https://github.com/alichherawalla/off-grid-mobile.git
-Android SDK (API 34)
-Android NDK r26
vs
compileSdkVersion = 36
targetSdkVersion = 36
ndkVersion = "27.1.12297006"
Thanks for pointing this out
I found a guide for virtual box macOS which failed on intel then another for hyper-V but haven’t tried that one yet.
The dash in "off-grid" is missing.
There are basically no useful models that run on phone hardware.
> Results vary by model size and quantization.
I bet they do.
Look, if you cant run models on your desktop, theres no way in hell they run on your phone.
The problem with all of these self hosting solutions is that the actual models you can run on them aren't any good.
Not like, “chat gpt a year ago” not good.
Like, “its a potato pop pop” no good.
Unsloth has a good guide on running qwen3 (1), and the tldr is basically, its not really good unless you run a big version.
The iphone 17 pro has 12GB of ram.
That is, to be fair, enough to run some small stable diffusion models, but it isnt enough to run run a decent quant of qwen3.
You need about 64 GB for that.
So… i dunno. This feels like a bunch of empty promises; yes, technically it can run some models, but how useful is it actually?
Self hosting needs next gen hardware.
This gen of desktop hardware isnt good enough, even remotely, to compare to server api options.
Running on mobile devices is probably still a way away.
(1) - https://unsloth.ai/docs/models/qwen3-how-to-run-and-fine-tun...
We have on-prem AI for my microgrid community, but it’s a nascent effort and we can only run <100b models. At least that size is extremely useful for most stuff, and we have a selection of models to choose from on openAI /ollama compatible API endpoints.
I'm surprised Apple is still cheaping out on RAM on their phones, especially with the effort they've been putting into running AI locally and all of their NPU marketing.
If there's a similar app for desktop that can set up the stronger models for me, I'd love to hear about it.
Any random thing might happen in the future.
That doesnt have any bearing on how useful this is right now.
All we can do is judge right now how this compares to what it promises.
No, it’s not.
Trust me, I don't write this from a position of vague hand waving.
Ive tried a lot of self hosted models at a lot of sizes; those small models are not good enough, and do not have a context long enough to be useful for most everyday operations.
Curious about a couple things: what GGUF model sizes are practical on a mid-range phone (say 8GB RAM)? And how's the battery impact during sustained inference — does it drain noticeably faster than, say, a video call?
The privacy angle is the real killer feature here IMO. There are so many use cases (journaling, health tracking, sensitive work notes) where people self-censor because they know it's going to a server somewhere. Removing that barrier entirely changes what people are willing to use AI for.
I'd recommend going for any quantized 1B parameter model. So you can look at llama 3.2 1B, gemma3 1B, qwen3 VL 2B (if you'd like vision)
Appreciate the kind words!
That's using the word "real" very loosely.