I've set up a small rig, mostly settled on Qwen3.6 and I'm slowly adding features myself. It probably can't compete with Claude. I don't even know, I've stopped checking. It's providing a ton of value to me as is, and it only keeps getting better. All it takes is to realize that it doesn't actually matter if the grass is (maybe even objectively) greener somewhere else. Feels so good to know that it won't change under my feet. I've got this amazing, highly extensible tool, and it's mine.
I tried out a few models and ended up going with either Qwen3-Coder-Next (no think, just do) and Qwen3.6-35B (thinking, w/llamacpp token budget). Created a customized prompt that works fairly well to around ~60k tokens and then is a toss up on whether it's poisoned itself or I've directly steered it into the wrong. When it's clear that's happened, if it's important to continue, ask it to write a doc then start fresh.
I don't kno whow any one cold have witnessed the last 2 decades of American VC funded tech startups and tell themselves, "you know, this will be a reliable technolgy with no hidden problems".
Even a sober technical evaluation is just two steps:
1. You're proposing to build a app on a non-deterministic model.
2. That model is hosted behind a non-deterministic system (model alignment, model guardrails, system context subterfuge, cost/token pricing)
---
So you want to build your app and you think you're going to kep up with both #1 and #2?
LLMs are, as far as the nastiness of the Real World goes, really fucking benign. Future models outperform past models, both in open weight land and at the big frontier labs. Performance per $ only ever goes up. That's just nice.
So piracy on an by piracy trained ai model..
Alibaba didn't steal Opus weights, they used opus output to train their model.
If this is piracy, then so is reverse engineering efforts powering a bunch of Linux drivers.
However, last month they introduced a new pricing model ( I know the old pricing was not sustainable), and my USD 10 was exhausted within days. Because of that, I switched to Claude Code and Codex and have never looked back. Yes, tokens on Claude Code and Codex are subsidized heavily, but let's just enjoy when good things last.
I do feel there is a difference between using Claude via Copilot versus using Claude directly in Claude Code. I'm not sure what Microsoft is doing behind the scenes.
Anthropic seems to have a modest lead on their harness and models, so it’s a best-of-both-worlds scenario.
> I'm not sure what Microsoft is doing behind the scenes
It’s probably the exact same model, but the tools and the prompts around it are worse, so you get worse results.
The new pricing model where I got banned from using Opus entirely and half a day of work (with weaker models) consumed the 10$ plan was.
I'm now using a Claude Max subscription and I can get close to the daily limits but I'm fairly happy with the overall plan consumption.
ACP is just a standard that bridges harnesses easily into IDEs, Text Editors, or whatever consumes it (I wrote a TUI that consumes them)
The registry for all the agents (tool harnesses) is here https://github.com/agentclientprotocol/registry if you ever are curious to what Zed or IntelliJ are really hooking into
When using Zed with the CoPilot integration I use Claude Opus and never had this issue.
I paid $6 yesterday for DeepSeek V4 Flash on OpenRouter. That's like $120 dollar for a month, and it's not even a good model.
I'll try that.
Might very well be that a better model is cheaper if it gets things right the first try.
Maybe I should route to a better model when v4flash hasn't solved after a specific number of tokens.
Unfortunately the June pricing change for Copilot forced me personally as well as my entire department at work to switch to Claude Code. With copilot we were hitting a few dollars of extra spend over the included credits in April and May, then in June we started chewing through the monthly budget every 2-3 days.
Just a completely insane price hike from the customer's perspective, I don't know what MS were thinking there.
Even if that is the price they need to be sustainable they should have waited until the competition changed their prices first. I wouldn't be surprised if Copilot lost 50% or more of their customer base last month.
Eventually this could be where all the major players set their prices, so the thought occurs to me that nations should run some form of "public access AI", just like they did for TV. Use the free open models and use tax money to finance a few datacenters. Geo-lock the use and set strict throttles to manage load, but let school children and citizens use that AI freely otherwise.
If Copilot's pricing is the level for all AI in a few years, only the unicorn companies can afford to use them, and everybody else has no chance of competing with a company that can use AI.
They did...
They're literally just passing on the costs https://platform.claude.com/docs/en/about-claude/pricing
Anthropic just provides a subscription - which Enterprise usually doesn't want you to use because everything you're submitting through that will be trained on / becomes part of their model.
So If you use it without explicit permission from your employer you may be committing a contract violation which can have serious consequences - up to jail time - as they can sue you for that.
Honest question, can you ellaborate? If given the option, I use OpenCode but what do you find in Copilot CLI that makes you prefer it to Claude Code?
I also use the Copilot ACP server inside Pycharm and that works decently well too, although it has some annoying bugs, but if you're a Jetbrains user you're used to annoying bugs.
I've swapped to the 20x Claude plan for a month or two to knock out two ideas I need to get it MVP - expecting Claude to go token priced soon.
The performance, if we trust the benchmarks, put it at Sonnet 4.6.
Let’s see if it’s worth it with GitHubs pricing.
I'm going to be called a chiller again, but at this point I don't care as it is relevant. Synthetic runs their own models for a reasonable price, GLM5.2 & Kimi K2.7-Code included.
Referral link :
Cache hit (most important): $0.19
Output: $4.00
This is the same as how much Moonshot charges for it, and it puts it at roughly the price of GPT 5.4 mini, not a bad option.
For some context here is a stupid prompt that wastes tokens: "Play a game of tic tac toe against yourself on a 5x5 board, you need 5 in a row to win."
It costs $0.006 on Kimi K2.7, and you get to see the whole raw reasoning trace.
GPT-5.4 mini costs $0.016 and its summarized.
And in case you are wondering both play incredibly stupidly.
Kimi:
A B C D E
1 . . . . .
2 . . . . .
3 X X X X X
4 . O O O O
5 . . . . .
GPT 5.4 mini: 1: X X X X X
2: O O . . .
3: . . O . .
4: . . . O .
5: . . . . OFable manages to make a reasonable game, at a cost of 40 cents.
X X O O O
O O X X X
X X X O O
X O O X O
X O X X OSaw in a discussion on Reddit that the team is evaluating glm5.2 so hopefully more to come!
https://fireworks.ai/blog/kimi-k2p7-code
I don’t know much about them but they did a deal with Microsoft in March:
https://azure.microsoft.com/en-us/blog/introducing-fireworks...
Says that they are run by Moonshot
From your link: https://docs.github.com/en/copilot/reference/ai-models/model....
Has their reputation tanked so much that the alternatives get all the buzz? Or is it that non-enterprise users are priced out by the usage costs, so no free marketing?
This comes up all the time at work because the vendor management people don’t understand the llm ecosystem and think Claude through copilot is the same as Claude through Claude code.
A simple side by side comparison will show dramatic under performance 3 or 4 times out of five when I’m asked to explain the difference.
I work at GitHub but even then I often use OpenRouter models in the CLI and Copilot App
I tried adding a Foundry LLM as Github Copilot custom model and failed miserably. But with VSCode BYOK (and Github Copilot as the interfact) i did get it working, and i can now use Deepseek V4 Flash with Copilot.
The company does need to integrate the new AI-human-machine interface into its application development SDKs.
https://docs.github.com/en/copilot/reference/ai-models/model...
They are run by Moonshot itself, so probably china
> These models are hosted on US-based Azure AI Foundry infrastructure managed by GitHub and Microsoft. Customer prompts and responses are not sent to the original model developers.
So not in China.