This is how every LLM API has worked for years; the API is a stateless token machine, and the prompts + turns are managed by the client application. If anything it's interesting how standard it is; no inside baseball, they just use the normal public API.
I wanted to use local LLMs (~30B) on my M1 Macbook Pro Max, with Claude Code for a privacy-sensitive project. I spun up Qwen3-30B-A3B via llama-server and hooked it up to Claude Code, and after using it for an hour or so, found that my network connectivity got totally borked: browser not loading any web-pages at all.
Some investigation showed that Claude Code assumes it's talking to the Anthropic API and sends event logging requests (/api/event_logging/batch) to the llama-server endpoint. The local server doesn't implement that route and returns 404s, but Claude Code retries aggressively. These failed requests pile up as TCP connections in TIME_WAIT state, and on macOS this can exhaust the ephemeral port range. So my browser stopped loading pages, my CLI tools couldn't reach the internet, and the only option was to reboot my macbook.
After some more digging (with Claude Code's help of course) I found that the fix was to add this setting in my ~/.claude/settings.json
{
// ... other settings ...
"env": {
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
// ... other settings ...
}
I added this to my local-LLM + Claude Code/ Codex-CLI guide here:https://github.com/pchalasani/claude-code-tools/blob/main/do...
I don't know if others faced this issue; hopefully this is helpful, or maybe there are other fixes I'm not aware of.
I use both Claude Code and Xcode with a local LLM (running with LM Studio) and I noticed they both have system prompts that make it work like magic.
If anyone reading this interested in setting up Claude Code to run offline, I followed these instructions:
https://medium.com/@luongnv89/setting-up-claude-code-locally...
My personal LLM preference is for Qwen3-Next-80B with 4bit quantization, about ~45GB in ram.
gitStatus: This is the git status at the start of the conversation...
Current branch: main
Main branch: main
Status: (clean)
Recent commits:
6578431 chore: Update security contact email (#417)
0dc71cd chore: Open source readiness fixes (#416)
...
Enough for Claude to understand what you've been working on without sending your entire repo history.