That's why I'm using eurouter.ai with the following routing rule for all my requests:
{
"model": "glm-5.2",
"models": [
"deepseek-v4-pro",
"deepseek-v4-flash"
],
"provider": {
"allow_fallbacks": true,
"data_collection": "deny",
"data_residency": "EU",
"max_retention_days": 0,
"eu_owned": true
}
}
Sure, it's quite expensive, but at least on a legal side data privacy is ensured. I trust them more than e.g. Anthropic, OpenAI or OpenRouter.Personally, I find it morally unacceptable to use U.S. AI tools, because I do not want to support them financially and thus support the crimes they are involved in[1].
This seems tautological because Europe is pretty weak on the values that people in the US might care about (freedom of speech, limited govt, etc).
What values specifically are you optimizing for here?
The US federal government forced Paramount to take Colbert off the air. Seems that people in the US don’t actually value these things.
> What values specifically are you optimizing for here?
Probably not being fascist.
Maybe it was funny to you, but designing data platforms that respect GDPR and involve LLMs is a thing.
I know LLMs move at the speed of light (especially these past few quarters), but if Opus and GPT "a few months ago" were really like open weight models, then there's really no reason to not switch, especially for those who were using these models a few months ago.
Your codebase didn't change, so use the open weight model. Don't move the goalposts.
So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.
Maybe the truth is the newest models aren't actually as impressive as we thought. Maybe our perception of progress is being manipulated via months of gradual, silent and unverifiable degradation.
Let’s say I’m a bad faith LLM operator, and I want to degrade my model so the next release looks better and people want to switch to the more expensive one. How would I do that?
They wouldn't even need to do this uniformly, quantized versions of the model could be routed only a subset of the requests. They could do this to nerf the old model, or more likely just to give themselves more hardware to run the new one on by handling more requests on less hardware. Or to handle increased request volume as traffic ramps up faster than hardware can be provisioned.
Playing with local models at various quants, the degradation can be hard to spot. Sometimes it's only noticeable in aggregate. And even then, you never really know if you just got unlucky with a bad response due to RNG.
I've had Opus 4.6 fall into some weirdly incoherent loops that I rarely see from even Sonnet, that felt like the kind of thing I got frequently with Qwen3.5 9B on local. And the above applies... Was that just bad RNG? Or was my request to Opus routed to some lower quality variant? There's no great way for me to tell for any given request, nor any way to guarantee Anthropic _didn't_ do that.
At least it's going to be usable as a very high end gaming PC.
There is also a low probability that someone enters peace negotiations solely to threaten the negotiators with death, yet here we are. With these guys it is: Better safe than sorry.
I didn't appreciate this until I started down that road myself.
Long term predictability ought to far outweigh a few more cycles of performance.
The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.
Even with minor automation I feel like I can watch OpenAI and Anthropic engineers fiddling in real-time. Tuesdays behaviour changes by Thursday, 10AMs production isn’t possible at 11:30AM. Nutty.
https://marginlab.ai/trackers/claude-code-historical-perform...
There were at least a couple of these degradation trackers.
I experiment a lot with the open models and I’m getting tired of this trope. I’m not yet convinced that even the best open weight models are equal to Opus from “a few months” ago.
I know what the benchmarks say. I had higher hopes. My real experience just doesn’t match the benchmarks.
I also do a lot of work that even Opus 4.8 struggles with. When even the cutting edge LLMs aren’t all the way there yet, my motivation to switch to something even further behind just isn’t there.
5.2 lives up to the hype. I don't find it to be the best at anything except coding. But for coding... yeah, it lives up to the hype. Not quite Opus 4.8-level, but I would feel comfortable comparing it to 4.5, at least if it had vision capabilities.
That's exactly the problem I have... with Anthropic and "Open""AI"
The moat is so flat, it only gives +1 food and +1 production. +1 gold with a road.
The really interesting thing is that it's typically those very same accounts who were explaining, a few months ago, that thanks to their commercial model they were gaining so much time and producing so much fantastic code.
A few months passes and suddenly the open-source model have caught up with the models that were gaining them so much time and that produced amazing code (in production everywhere for sure btw) but... It's impossible to work with these models.
Rinse and repeat.
The current models, according to them, are basically AGI and they can go fishing while paid subscriptions solve the world's problems.
But when it six months there shall be new closed, pricey, models and when the open ones shall have reach the level of Fable, we'll hear how it's impossible to work in late 2026 on a model that is "only at the level of Fable".
These people should have been snake-oil salesmen (and it could be what they actually are).
Not unusual in the tech space, but this has been basically constantly happening for two years now? I can't imagine the improvements are more than incremental at this point.
Just like the OS ecosystem I think we'll see a similar trajectory with OAI, Anthropic and Google but on a much accelerated time scale. I think the lobbying has begun to lock in their fate for revenue - because none of them give a shit about their users. I do hope, however, that Anthropic continues to over rotate and continue to gimp their models into uselessness. I just asked Opus 4.8 the other day to look at some code as an adversary and summarize areas that should be addressed. Nothing specific and it shut down the conversation. However starting a new prompt and prodding the model from a different angle yielded the results I asked for directly. Pick a lane. Or, don't and continue to lose industry respect and consideration.
not all of us are doing noob shit lol
> I’m hoping it’s going to be minimal.
I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.
I just can’t agree yet. The models from Anthropic and OpenAI really are that much better than anything else. The open weight models must be universally benchmaxxed across the board because my real world experience with them is very different than what the benchmarks imply. I get downvoted a lot for speaking about my experience because I don’t think it’s the reality that people want to hear right now, but it’s true for complex work.
I do think there are a lot of easier tasks that can be handled appropriately by the open weight models in the hands of a skilled operator. If an entire job is simple enough that you wouldn’t hesitate to hand it off to a junior with a little supervision then any model will do. However for a lot of the work I do, even Opus 4.8 on Max requires a lot of attention and extra steering and review to keep it on track. Fable did, too, though to a lesser degree. When I try to use the big open weight models (hosted, because they’re not running at reasonable speeds locally at a quantization I can tolerate) it feels like I spend more time waiting while they burn tokens for output that I probably have to reject anyway, at least for the bigger tasks. I wish they were there, but that’s not the case yet.
I like the Linux analogy, I struggled with Linux way back.
Having played a bit with Fable, reinforced the above.
I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.
There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...
TL;DR: Running GLM 5.2 is going to cost about $20K minimum, and that's going to be painfully slow compared to the cloud hosted versions. Even the estimates where the server is computing tokens 24/7 you can't break even for several years.
The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.
Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.
$10 a month gets you generous usage with the best open weight models and they claim to have zero retention and not to train on your usage.
It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.
Personally I haven't seen any productivity gain since Opus 4.5 times.
But: I can't fully get behind the opinion that (so called) "open source models" are simply superior and will be in the future, because when I asked some models who they are, they answered with "I am Claude from Anthropic", which could mean they have been trained by exfiltrating Claude.
I have NO moral objection to this, as Anthropic and "Open""AI".also trained their models on anything they could get their hands on.
It's more about the question: can and will these models be updated, even if Anthropic et al fail. Who's gonna pay for training then? What's their incentive? Have we reached a plateau?
Right now, due to profound shortfalls in both data and hardware compared to the US labs, the OSS models are IMO basically technology demonstrators that in practise are even more jagged than the US labs' efforts. The number of happy paths is many times fewer, and their behaviour inside the harness is far less refined. Barring some incredible breakthrough I don't think that is changing without a much higher level of resources - which seems impossible given the current hardware environment.
I have no reason to think that Anthropic or OpenAI are in possession of some secret sauce that the Chinese labs can't duplicate given the right resources, but the fact remains that absent those resources they'll remain behind. Barring some incredible bombshell reveal from Huawei I don't think this asymmetry resolves in a year. In three years it may well be a different story.
Sure, there may be some cases and reasons for local models and industry is so large they will continue to make progress and gather economic value and users for specific use case; but frontier will command vast majority of the economic value distinct from Linux and open source where the model created better than proriatary economic incentives around development
Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.
Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.
I enjoyed the first part though
and what hardware are you using?
Not only does Apple's unified memory give the GPU more RAM to use, but it also eliminates copying things between CPU RAM and GPU RAM.
A Mac Mini with 48 GB RAM costs $1799. A Mac Studio with 96 GB RAM is $3999 — until March you could get a Mac Studio with 512 GB RAM for $3999, all of which could be used for your AI model.
https://www.tomshardware.com/tech-industry/apple-pulls-512-m...
Some are coming up used at silly prices.
https://www.trademe.co.nz/a/marketplace/computers/desktops/a...
NB NZ$44,999 is "only" US$25,772.
For a while during this era, I used to port my laptops windows installation into a virtual machine that can run on Linux. It took a bit of hacking away but I could usually do it in a day or two. Then its all Linux with the windows vm being used for the microsoft stuff.
I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".
I have given up on making Opus actually retrieve online information for me. At this point I only query it side by side with qwen to laugh at how it didn't even attempt to search properly, and how a small local model is beating it every time. Gemini is very fast for searching, but somehow miss-sources all the time.
First time I did this I realized in 5 seconds that the big players weren’t going to be carving up the market between them.
The things you describe are just tool calling, they're a feature of whatever harness you use. Use OpenCode, pi.dev, or maki.sh with any of the open models.
> I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".
You can do most of this with some system prompts added to whatever agent you're using. You can do it from the settings on the claude/chatgpt websites too. (minus the no-guardrails thing)
A $10,000 RTX 6000 Blackwell card will pay for 500 months of Claude or Codex, which is 40 years worth of compute. Obviously they are going to raise their prices, my prediction being to $200-500/month, but that still makes them at least years of compute and they scale very well with more traffic. Single GPUs do not, they are pegged at 100% and good luck getting it to answer multiple queries at the same time.