Don't make us shame you into disclosure
Don’t you know there’s no honor among thieves?
GPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference
I've had some success turning my macbook M1 pro into a heating pad with Qwen 3.6 35B A3B MTP. Trying to use Gemini models "locally" resulted in a similar "short shrift" of effort resulting in mistakes and lots of turns. The reports of Fable being relentlessly "proactive" shows you can go the other direction as well, if you have strong enough branding and effective invoicing.
Once you have a coherent design (the hard part), you can feed it to a pretty small model and get basically the same quality.
They'll not one-shot, but they're faster and cheaper, so it still works out in your favor.
Plus you can do it locally...
Only that it’s a fairly meaningless grouping. When japan first entered the car market in north america there might have been some commonality, but now what characteristics do they share that some american cars don’t have? They’re not even imported a lot of the time.
Given that, it does start to feel tinged with racism if someone insists on grouping things together that don’t really belong together.
As for Chinese LLMs, the term doesn’t “feel” pejorative to me - but i also don’t see a totally clear set of attributes they share. Not all are open-weight. Some are small and can be run on consumer hardware, some are huge. They even have a variety of answers to what happened june 3rd 1989
Typically the answer is "reliability", which is a positive trait, which makes the original callout about negative connotations very odd to me.
The term seems to have the connotation of "competitive at 1/10 the price of Claude", so I don't see the problem.
It's not Harbor Freight Chinese (and heck even they have decent stuff sometimes now too).
You don't think people still talk about Japanese cars as a distinction in quality from US or European ones?
Edit: Downvoting something doesn't make it false.
And even if the Chinese Communist Party provided funding, the result is still transparently released. So even if it is some kind of propaganda, I don't see what the problem is.
Is the monopolistic greed of American companies 'good', and China's greed 'bad'? I do have that question.
China is a communist country with elements of capitalistic markets baked in. But the capitalistic elements are mostly a facade. Underneath, the state retains full ownership and control of all business. The CCP runs all aspects of the government (including the courts/judges), and is the single entity that decides what directions the country (and it's businesses) will move in.
The CCP, who defacto owns everything and has ultimate final say on everything, has one leader that has the ultimate final say on _everything_, Xi Jinping.
So while the waters of CCP models feel warm and free, understand it's not organically like that.
While I get the point you're making (it should be pretty obvious to anyone who's held a newspaper), I think it's important regardless to point out that Chinese companies AFAIK aren't worker-owned or -controlled, so you can't exactly call it communism, either. And they obviously do not have a "free market capitalism", as you just discussed.
It's simply a highly authoritarian state then, I guess?
I have a feeling you'd be slightly salty at people saying "Google and Tesla are making CIA models"
Since its development, IQT has invested in over 750 startups spanning diverse technological sectors, including:
- Artificial Intelligence
- Space Technologies
- Microelectronics and Quantum Computing
- Life Sciences
- Cybersecurity
- Hardware
- Energy
This broad portfolio has enabled IQT to address a wide array of national security challenges while supporting the growth of innovative startups…https://en.wikipedia.org/wiki/In-Q-Tel
https://www.npr.org/sections/alltechconsidered/2012/07/16/15...
In China it's all one entity with these mock facades of privatization. Trump cannot instruct Google to put picture of dogs on their homepage. If Xi wakes up and wants dogs on Alibaba's homepage, give it 30 minutes.
It's wholly ignorant or dishonest to make the comparison.
Sundar Pichai would personally be barking on a livestream on the homepage.
Trump is quite literally the one president showing that the US has zero rules or anything to hold power back from the white house, really not the example you want.
Sundar can do whatever he wants, but he has no legal obligation to do any of it.
I'm sorry, but that was a horrible example. Corporations have no obligation to donate money to the ballroom yet Google has donated millions.
Claude Code is better. But Opencode + kimi 2.6 is workable, which is big. For bare code writing, if you know what exactly you want, most popular models are fine (deepseek, kimi, etc), it feels more or less the same as anthropic models.
At the same time, Opus seems to understand my intent way better than e.g. deepseek. I need to be much more precise with my prompts when using deepseek - it often goes in a wrong direction if I'm lazy. This results in a workflow which feels quite a lot different from Claude Code.
Kimi is in between - for me it brings back "lazy prompting" workflow, and I can trust its plans more than deepseek. It enables a workflow similar to Claude Code, it's workable, but it is a bit worse everywhere. Smaller context, a bit more errors, decisions are a bit worse, recommendations are a bit worse, debugging capabilities are a bit worse, etc.
On the usage side, $100 Claude plan is a great value actually. On paper, per-token kimi is way cheaper, but Claude subscriptions are heavily subsidized - you get much more tokens than $100 can buy you. So, in the end, opencode + kimi vs claude code could be of a similar cost, for similar usage patterns. Deepseek can be cheaper, and it has insanely cheap cached tokens, but experience may vary - depending on your habits, you may need to adjust how you work, coming from claude code.
I'd say for side projects something like $10 Opencode Go plan + $10 of extra DeepSeek v4 credits (e.g. on OpenRouter) can be very workable.
how much of that is Opus injecting prior conversations from memory?
I almost never use the desktop app, I have maybe 2-3 conversations over the last year that have nothing to do with my job. Opus (and now Fable) genuinely do seem to "understand" what you intend based off what you're explaining a lot better than other models I've tried.
Gemini gets close in some cases, but it falls over in the actual implementation sometimes. I haven't tried Kimi yet but MiMo isn't too shabby either.
DeepSeek-V4-Pro is adequate plus use DS4-Flash for tasks or other small activity you’d use Haiku or Sonnet for. Go sign up with $10 prepaid.
OpenCode Go - go sign up with $5 for a month and use Qwen-3.7-Max for design/plan/architecture or difficult troubleshooting. Feels closer to Opus 3.6 or 3.7 than DeepSeek, closest I’ve found.
OpenAI Codex, $20 a month plan, use GPT-5.5 via API for the same design/plan/architecture/troubleshooting/author commits. (You can also pay $100 and cut and paste really difficult problems into chat with the GPT-5.5-Pro model.)
Xiaomi MiMo-2.5-Pro, find a friend to give you a $2 referral code, you get 72 cents free. Same pricing as DeepSeek. Somewhere between Sonnet and Opus, quite capable. Apply for the UltraSpeed beta too.
You can switch in and out from these models on the fly in OpenCode or ohmypi and simply find the one that feels best to you. I use CodexBar to watch consumption in near real time.
For a casual user or someone new to programming, Cursor’s $20 plan is an excellent start with Composer-2.5 and Composer-2.5-Fast. You get an API allowance too you can use to access Opus-4.x or GPT-5.5-Pro from OpenCode or ohmypi in addition to Cursor itself.
Finally, if you use Grok or Twitter, SuperGrok at $30 a month has a good vision model, which I used for automated testing of front ends. I’m migrating to locally-run Qwen-3-VL on a commodity Mac, though. If you’re less technical unreach makes hosting local models on a Mac easy.
If you have a powerful GPU like an RTX 5090, try Qwen-3.6 locally on that too. Use ollama or llama-swap which is fairly easy to use.
I have not tried new Kimi yet but we have been able to keep our costs at or below $200 a month per employee with a team of 3 professional developers, 1 graphic designer who uses a lot of Midjourney and Grok Imagine now driven from workflows she made herself in ohmypi, and 1 nontechnical user (account manager / project manager) who uses ohmypi to help her gather requirements and track implementation of them. With a tiny bit of effort we could get that number closer to $75 per employee per month.
What's the benefit of using OMP over OpenCode?
Just the sheer amount of options in OMP overwhelmed me. But I also use both via ACP in Zed so the CLI itself doesn't matter much.
It's good, does most tasks well that I throw at it, but will fail at anything congitive/complex. It gets stuck often. It costs ~6$ a month though
I use the oh-my-openagent planning system and haven’t used vanilla OpenCode enough to know how much that is contributing.
On the other hand, Opencode, Pi agent and other open source tool offer much better support for all models, including open source.
I use DeepSeek V4 Pro now, which works pretty well.
Other than that it’s pretty decent (for the price).
That's a funny anecdote, buut I'm not able to reproduce. Where/how/when did you get this, or hear about it? It might've been patched by now, at least that's the feel I get from my limited testing.
Using bare aichat [1] with no system prompt and no temperature nor top_p (and I'm truncating the response after the first line that contains the name the model gave, because the point has been made clear by then), and with the same prompt (approx. "Introduce yourself!") every time:
Claude Sonnet 4.5:
> 请做个自我介绍!
你好!我是Claude,一个由Anthropic公司开发的AI助手。 […]
Claude Haiku 4.5:
> 请做个自我介绍!
# 你好!
我是 *Claude*,一个由 Anthropic 公司开发的 AI 助手。
Claude Opus 4.5:
> 请做个自我介绍!
# 你好!
我是 *Claude*,由 Anthropic 公司开发的 AI 助手。
Claude Opus 4.6:
> 请做个自我介绍!
# 你好! 我是 Claude
Claude Opus 4.7:
> 请做个自我介绍!
你好!我是 Claude,由 Anthropic 公司开发的人工智能助手。很高兴认识你!
Claude Opus 4.8:
> 请做个自我介绍!
你好!我是 Claude,由 Anthropic 公司开发的人工智能助手。
Claude Fable 5:
> 请做个自我介绍!
# 自我介绍
你好!很高兴认识你!
我是 *Claude*,由 Anthropic 开发的 AI 助手。 [2]
I don't see a Kimi mention, unfortunately. :-)
[1] https://github.com/sigoden/aichat
[2] This model really is noticeably more verbose even with supposed-to-be-brief responses huh, lol
I said that about opus 4.5 at the time, thinking "this is so good, in 6-12 months the Chinese models will be as good and cheap, I will use them", but I was wrong.. I pay premium for opus4.7/8 and Fable.
But at some point, it will just do the thing you want it to do, and then the race to the bottom will start.
Now that Chinese companies have access to some very good Fable tokens, I hope it speeds up the race.
so better models may still be cheaper even if the price per token is higher.
My theory is that US enterprise just can't send data to Chinese and that's understandable, but is that "the moat"?
I say this as a relatively frequent user of Kimi models and generally a big fan. But on not-yet-gamed benchmarks like DeepSWE, Kimi K2.6 is beaten soundly by Claude Sonnet 4.6 ($3 / $15) and even slightly by GPT 5.4 Mini ($0.75 / $4.50).
There's no question Kimi models are very good for a lot of code tasks. They're the best quality open weight model. But to get similar overall outcomes as on Sonnet/Opus, on average you'll spend many more tokens and will have to do more managing of the model. You shouldn't look at price per token, you should look at how much you pay for the entire process.
One major thing DeepSWE has going for it is that all other benchmarks (including those quoted by MoonshotAI on this page) don't: the other benchmarks that are completely gamed. The benchmark answers are public and part of each model's training data. This benchmark may still be iffy, but at least it's not gamed.
Everybody has incentives to manipulate benchmark results to show their models in the best light.
I also wonder if Enterprises have deals for other API pricing that is not posted publicly, so all we see is a high API sticker price.
I'd further say that there are probably enough rational actors running evals out there that the marginally better is not pure vibes for the cases where people are spending lots of money, but I only have direct line of sight to some of those eval suites. Maybe everyone is irrational and anthropic is exploiting that!
But if AI doesn't lead quickly to vast large scale replacement of workers as promised, I could definitely see the C-suits and their gaggle of consultants starting to ask questions about token pricing.
Lots of US providers are hosting these “open source” models so doubt that’s the problem.
- twice as expensive on the output (1.52 vs 0.87)
- six times as expensive on the input (0.33 vs 0.05)
https://openrouter.ai/deepseek/deepseek-v4-pro?sort=price#pr...
For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).
Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).
For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.
If you look at a file like:
https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...
you can see that I attribute the models used. What I found was 4.7 was not very good at `go` code which was why you started to see `Gemini 3 Flash` in the attributions.
4.7 is what Cerebras provide and for me, speed in iterations is a lot more important. Having played around with MiMo v2.5.0-Pro, I am 100% sure it could have done what Gemini 3 Flash did.
There were a few points where I was stuck and needed Sonnet to explain things to me, but I think the dirty secret that Anthropic and OpenAI won't tell you is, if you know how to code, the models are honestly good enough.
Based on my experience with MiMo and what others are saying about GLM 5.1, we are now in a hardware race. The Chinese Models are 100% drop in replacement for Claude if you know how to program but want to AI to help amplify what you know. What I will consider now is what provider can provide the fastest inference.
MiMo-v2.5.0-Pro-Ultraspeed is really good at generating good results quickly and burning your money as fast.
I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.
(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)
For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.
But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)
--
[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)
When I tried glm found it way way slower (omlx as runtime)
Use DCP or Magic Context plugin in OpenCode to keep the context below 160k and you're fine.
UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.
- GPT-5.5: 62.7%
- Opus 4.8: 62.2%
- Kimi K2.7 Code: 56.3%
- Kimi K2.6: 48.2%
https://platform.kimi.ai/docs/guide/kimi-k2-7-code-quickstar...