Really nice to see they're giving credit to the company and I am optimistic Kimi K open models soon will outperform Opus models
For a professional tool it’s getting egregious how little respect they have for my workflows and flow state they way they keep moving, changing iconography and flipping switches of the UI.
It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.
Should try their CLI!
- lags constantly,
- if you type while it's generating you'll get missed inputs,
- 'plan mode' doesn't clear context before starting work,
- you can't directly edit the plan, you can only ask the bot to do it,
- you can't immediately whitelist commands, only accept once or allow all.
It’s a near constant regression in my workflows. “Multiple agents” got destroyed recently, and the new interface for it some sort of command isn’t as good or reliable. Then you’ve got modals everywhere[1] and truncated bits (like long branch names) that make it insanely frustrating to use.
They’re constantly changing the UI without actually improving it at all. I’ll likely cancel it and use opencode for personal stuff with Deepseek and only use it at work because I have to. There was a time when I appreciated the harness but it’s becoming less useful, or at least noticeable, over time… all the while the actual UI becomes substantially more painful and awkward to use (like @ in the “agents” window being completely unable to find a file because it’s some sort of “global” scope).
One thing that surprises me about this whole segment is that JetBrains haven’t eaten these folks lunch. Their IDEs are leagues better than VSCode but their AI integration is awful by comparison (and the bar is low). I can’t even see how much of the context window I have left.
[1] it’s insane I have to answer questions in a tiny input box I cannot resize or adjust the size of. Let alone the fact the text area I input prompts into cannot be resized. Truly feels like the UI/UX is done by people without any experience.
To me it feels like it's done entirely by an LLM, starting from the product vision.
One of the things I've came to appreciate about the cli tools like Codex or Claude is that the interface is so limited that every feature they release is still limited and constrained to the same UX limitations, whereas those "funkier" IDEs change from month to month giving me further fatigue.
Wouldn't this compress ai revenue like 15x quickly
If they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planing
Maybe they are getting elon to cover cost
One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.
The real money furnace is the training, not just of models that get released, but also experimental training runs that fail to move benchmarks and are quietly thrown away. E.g. Cursor claim that 85% of the compute for Composer 2.5 comes from additional training on top of Kimi K2.5, where I'm not sure how they determined that, but it can't have been cheap. Then they say "Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute."
So yes, they're probably attempting to replicate the Anthropic playbook of paying a large upfront cost for a very good model, and then rapidly acquiring paying customers, hoping that the inference margin will be enough to cover the training cost.
i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha
that roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token
Every model release now has been straight price increases since what GPT 4 ? When was the last time a new flagship model decreased prices compared to the previous one ?
2. We are not interested in how different model naming schemes relate to prices, we are interested in the capabilities. So if you want to learn something about price development you need comparative levels of capabilities, and then look at the prices. 4o is not comparable to 5.5 in the first regard. It is (according to the benchmarks) maybe more comparable to current 5 nano - which is 98% cheaper.
Apart from that, I'm not sure if focusing on tokens is even a good idea, because they are so different from model to model. I'd almost consider them a red herring now.
We could look at tasks instead. Is there anything even remotely suggesting that your typical task you give an LLM now costs less in inference than before?
Impressive, yes. But they still don't have a moat...
The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.
I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.
Anyway, would love to see a comparison from someone who has used a recent version of each.
In my setup I use multiple agents like Claude Code and Codex, and Zed’s ACP support makes it pretty nice to manage them all as “threads” in one place. Worktree switching also feels much smoother.
Overall the experience was pretty good, but the way the agent and editor are integrated still feels a bit lacking, and tab completion is the big one for me. Cursor’s tab completion is still the best I’ve used.
So now I’m using both. For work that needs a lot of focus and careful iteration, I use Cursor. For things that are easy to split into worktrees and hand off to agents, I use Zed with Claude/Codex.
Every MAG 7 / FAANG company already has more users and more data...
That's not a moat.
That's traction.
That's Y.
What's wrong with using very short sentences like 'That's not X. That's Y.'?
Plus you are always running the risk of being rude and insulting when incorrectly labeling text actually written by humans as slop — making a jackass of yourself — and opening yourself up to being trolled by humans purposefully inserting em-dashes and catch phrases just to trigger you. That's not clever. That's gullible.
How much cognitive and physical effort and time do you put into trying to figure out if everything you read is slop, then complaining about it? If that's your job or calling in life, you could be easily replaced with AI. Find something more creative to do with your time.
If you really object to low effort slop, and not just relish it as an opportunity to whine, then how about instead of posting low effort whines about slop, you put in the actual effort to do something about it, and rewrite the slop in a way that won't trigger your slop detector, then post that instead, to train AI not to write slop.
Is your problem that it's slop, or that it's AI generated? Because your whining about low effort AI generated slop without contributing to the conversation or addressing the point of the comment you're replying to is just low effort human generated slop.
Please don't post slop while complaining about slop.
Exactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.
With so much money and computing from SpaceX, is not so impressive.
& now they're still losing all of their users to Claude Code and Codex.
Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.
It's not like Cursor harness is the best out there.
And even if I want to edit the code, I don't need to run the agent harness in an IDE.
I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.
How I see this is that its so important to bundle the model with the right tooling.
Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).
So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks
So now 2.5 is supposed to compete with opus 4.7? Sure…
The fast version of composer is the default now (which costs ~x3 as much).
I think 300 million is too low. For reference before I could do more than 1 billion on same conditions.
There's nothing to mess up. The license is MIT w/ attribution, and the attribution clause can be easily sidestepped w/o any legal repercussions. The "drama" was simply content creators going nuts over some misunderstandings and poor comms from some kimi related devs.
I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.
As for the typo, s's are cheap and I've added one :)
That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.
People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.