ollama launch claude --model gemma4:26b OLLAMA_CONTEXT_LENGTH=64000 ollama serve
or if you're using the app, open the Ollama app's Settings dialog and adjust there.Codex also works:
ollama launch codex --model gemma4:26b ollama launch claude --model gemma4:26b-a4b-it-q8_0UPD: tried ollama-vulkan. It works, gemma4:31b-it-q8_0 with 64k context!
I mean yeah true but depends on how big the model is. The example I gave (Qwen 3.5 35BA3B) was fitting a 35B Q4 K_M (say 20 GB in size) model in 12 GB VRAM. With a 4070Ti + high speed 32 GB DDR5 ram you can easily get 700 token/sec prompt processing and 55-60 token/sec generation which is quite fast.
On the other hand if I try to fit a 120B model in 96 GB of DDR5 + the same 12 GB VRAM I get 2-5 token/sec generation.
Using ollama's api doesn't have the same issue, so I've stuck to using ollama for local development work.
And if you somehow managed to open up a big enough VRAM playground, the open weights models are not quite as good at wrangling such large context windows (even opus is hardly capable) without basically getting confused about what they were doing before they finish parsing it.
I'd rate their coding agent harness as slightly to significantly less capable than claude code, but it also plays better with alternate models.
Why/why not?
There are benefits too. Some developers might learn to use Claude Code outside of work with cheaper models and then advocate for using Claude Code at work (where their companies will just buy access from Anthropic, Bedrock, etc). Similar to how free ESXi licenses for personal use helped infrastructure folks gain skills with that product which created a healthy supply of labor and VMware evangelists that were eager to spread the gospel. Anthropic can't just give away access to Claude models because of cost so there is use in allowing alternative ways for developers to learn how to use Claude Code and develop a workflow with it.
And is running a local model with Claude Code actually usable for any practical work compared to the hosted Anthropic models?
It's an okay-enough tool, but I don't see a lot of point in using it when open sources tools like Pi and OpenCode exist (or octofriend, or forge, or droid, etc).
It's so jank, there are far superior cli coding harness out there