This means we're going to need $1t+ per year in spending, per year, on tokens. 200m knowledge workers in the world, 30m developers. We're talking about a world where you need 5% of every knowledge workers salary to go into tokens. 20% if you're a developer.
That's a _huge_ shift. Most people I know cite +20%-40% velocity with these tools, against the actual work their company cares about doing. +20% speed for +20% spend isn't going to motivate a trillion dollars a year in spending.
We're not there yet. This is still the upswing of the hype cycle, and unless we figure out how to make developers 2x, 5x, 10x as productive on stuff that matters, this isn't going to play out well.
- The publicly available information about how inference costs compare to training costs is conflicted. EEs involved in datacenters talk about power usage spikes during training runs as if they were a major factor in the designs, but academic papers discussing cost-optimal scaling confidently treat inference-time compute as a major factor.
- On the side of the balance indicating that training is more compute-intensive after amortization than inference is that Chinese providers, constrained primarily by access to compute, have nearly unlimited token availability at a lower price than US providers (inference), but poorer model capabilities (training). That would make sense only if US providers are inflating inference costs by 20-30x due to amortized training costs that overseas providers were not able to take on.
- If training >> inference, they're in a prisoner's dilemma that far exceeds the ordinary zero-marginals model of competition between firms (due to its huge discrete stepwise nature). On the other hand, if inference>>training, the high-level analysis popularized by certain thought leaders, that it's like a utility, would be true. You'd tend to count this as a vote for inference>>training, but the CEOs saying it at least have a huge incentive to agree because the alternative, the prisoner's dilemma, would stop investment very fast.
- The only voice in the story that I just told you to have anything to do with fact (as opposed to high-level analysis and ivory tower armchair management of a secretive business) were the rumors from facilities engineers. That shows you the state of our understanding...
- If we don't even know the ratio between amortized capital expenses and operational costs, outside investor analysis is impossible. It doesn't matter how finely they divide the accounting buckets for office ferns and indoor ferns if the single biggest part of their business is obscured for trade secret reasons.
Yes I know there's no evidence and this is lazy reasoning. But there's probably a bit of truth to this line of thought.
Speaking to your point, inference being dramatically less costly than training would not be seen as a delta from the norm. The things that thought leaders are saying, that they are providing inference for anything near the operational costs (like a utility would), is the delta from the norm.
That's the game. There's a view you could take of this that this is just a growing of the pie: with those cost dynamics a lot more "small businesses" get a vast amount of leverage, so the overall economy grows without replacing the knowledge workers. I'm not sure I trust the MBA class to have that view.
I would argue that that's been the case for quite some time before AI. As an example, what innovative amazing world-changing products have Google or Meta launched in the past decade with their very high numbers of very talented and highly-compensated engineers? The issue with most big tech companies are leadership, strategy, and product direction. I'm not saying that they don't make any profits, just that they probably aren't "building [the right thing]".
AI for product development and management would be far more impactful than automating rote coding tasks / building React UIs that mirror API structures IMO.
I don't think there is any shortage of great ideas at these companies, they are just extremely bloated. And I don't think its something like indecision or bad PMs, it's "we have a finite amount of time and resources so we need to be conservative but also not too conservative"
If you have AI systems that can simply build out POCs in days, backtest on real data, show reliable results and numbers, you get a suite of product options you were never able to get before. If you have coding agents that can speed up implementation, you can build more stuff and choose the things that stick.
It changes the cost/benefit calculus of the entire business. I think you are exactly right in that: PMs/leadership are by their nature orchestration machines. Other roles are as well, but I think PM's are at a particular advantage here in that it will be quite awhile I would expect before core product decisions and creativity can be delegated to an AI, but not quite awhile until virtually everything that they're blocked on (legal approvals, POCs, wire frames, etc etc etc) will become less and less of a blocker
Yeah, if this stuff actually worked that well already, OpenAI et al. would just run AI CEOs and engineers. Why get some other company to pay you at all when you can automate every other company out of existence and take all the money they make?
The fact of the matter is that while the tech has some uses, it sure as hell isn't a full scale replacement and you almost always actually have to massage the input into LLMs to get anything decent back out in practice. Some CEOs and managers can learn to do this, of course, and some already are... but that quickly turns into a second full time job. A "programmer" is still needed. The job might change from mostly hand-writing C++/JS/Python to prompt engineering + some manual coding to fix all the stupid fuck-ups that the bots can't solve themselves, but you still need someone to actually prompt the bot.
When that changes, it won't just be engineers losing work; there will be no reason to even have a human CEO any more.
I suspect that AI will fail to pan out to the same extent for the same reason why outsourcing hasn't fully panned out (even though every company tries it after getting big enough).
The problems that will come up will be and always have been ongoing maintenance. AI is great at writing new code without a brain behind it, but once you get to the point where you need to refactor code, you start really needing someone with coding experience to guide the AI or veto it's mistakes.
I don't think that's really fixable even with a lot better AI. It's not something that ultimately comes out of the likes of github data.
I'm not saying that AI isn't going to make things better, btw, I just don't think we'll see a 20x improvement. Probably more like 1.5 or 2x.
The determinant of success was only whether the task needed American-tier labor or could make do with sub-American quality labor.
It sounds like the economy would largely reduce to the small minority class of independently wealthy people.
It takes a skilled knowledge worker to use these things.
Not completely, but compared to the middle ages we 50x'd their output. Which is a great illustration what it means to make a job 50 times more productive. We went from almost the entire population being required to make enough food to survive, to 4% of the population producing such an abundance that consuming too much food has become a systemic health issue
They do not care unless these companies can get a bailout.
UBI only exists for companies that are too big to fail. Case in point, 2008 and SVB when there was too much money on the line.
One of the AI companies attempted to guarantee themselves a way for the government to bail them out if they were close to defaulting on the debt from the data center build out.
Arguably, the main impact of securing SVB depositors above the $250k limit is that it prevented thousands of people from being laid off that week, as their employers wouldn't have had the money to make payroll the following Wednesday.
What makes you think the people who used to build (or would have built) software will switch into the industry of "knowing that the thing was the right thing to build", as opposed to something cooler like surgery, city planning or experimental physics? The roles within a tech company are not the only jobs in the world.
“There’s more capital than good ideas to fund” has been a complaint from the likes of A16z & other VCs for a long time now. It’s why we ended up with stuff like NFTs getting funded.
No one I know feels richer than they did a decade back. I've not been able to meaningfully put up my prices for a decade. People are tired and stressed and scared, particularly scared of a technology everyone keeps telling them will make them redundant.
There is no rising tide lifting all boats, just most of us drowning whilst a few whizz past in their yachts.
I honestly hope these guys faceplant ASAP. Couldn't happen to a nicer bunch of people.
Consumption has risen, inflation adjusted wages have risen for blue collar and white collar alike. Most social mobility has been the middle class moving into the upper middle class, not moving to the lower class.
The main thing holding people back is the housing crisis. This is orthogonal to the value creation of businesses.
Value creation is growth. If it didn’t exist the S&P would still be 42.55$.
What sort of new value, and why will people pay for it from someone else rather than prompting for it themselves?
They are assuming ~10% global GDP growth instead of ~3%. You probably don't need the same %s if the pie grows a ton.
I'm highly skeptical we get that growth, but if you aren't, it makes it easier to digest.
The more AI causes productivity increases, the less and less number of workers will be needed. This will heat up the job market even more and bring salaries down.
Net effect of this productivity increase: less consumption by the masses, even though you may be producing more good and much more efficiently.
A third effect also comes into play that once all this starts to happen, common people, who are generally living paycheck to paycheck, will now start to hesitate towards making any long term investment, housing included. And that indirectly will end up impacting financial and banking sector, which will then impact existing savings, bonds yields and retirement funds, and the recession-like cycle starts.
This productivity increase only makes sense if it is capped to a very small number.. like 20% max. Beyond that, who these companies will even be selling to?
Am I overthinking all this?
Secondarily, reducing the cost of making a thing doesn't always mean you get less of a thing. For me, certainly, what happened is that I write way more software than I originally did. When we built compilers, the amount of human engineering effort required to do things plunged, but the amount of software engineering jobs didn't go down.
This is as bad as models will ever be. That part is true. And it's entirely possible we go foom. But it's also possible we don't, and then it depends on where the asymptote lands.
0: https://www.slowboring.com/p/this-economic-myth-needs-to-go-...
>Net effect of this productivity increase: less consumption by the masses, even though you may be producing more good and much more efficiently.
Big tech companies can't even create login flows and account recovery flows that work for everyone yet. There are countless stories of folks losing access to business Instagram accounts that get hacked, Google support from a human to fix a problem that is outside of their help articles is non-existent, etc etc. There's still so much "low-hanging fruit" IMO that isn't particularly fun or exciting to fix, but ask your average non-tech friend or family member what they think of the Facebook + Instagram security settings pages / sites / desktop-only settings.
Who is going to pay for all of these subscriptions that will power this GDP increase when average purchasing power of those outside of the top ~10% of earners is decreasing YoY? We're headed toward food and water shortages next to sprawling datacenters, not shared societal prosperity and a healthy middle class.
That only holds if companies have a fixed need for "productivity" which is met by their current employees, such that their employees becoming more productive means they need less of them.
Every company I've ever worked for has wanted to achieve way more than they are able to get done with current resources.
But generally yes, the biggest open question about all of this is how the impact will play out on the economy, job opportunities etc. I've not seen anyone come close to a confident prediction about how this will play out.
I mean sure. Every company wants an infinite addressable market. But that doesn't mean it exists.
It might not be possible to sell 10x the software we sell today. It might not even be possible to sell 2x
Except that if your company go 20% faster than the others companies, you win market shares. But then, everyone will use the same tools and companies will be at even speed, but the tool will stay.
Now...if the market is saturated, it's useless to try to do things faster. Cheaper yes, but not faster.
Uber was basically only ever software to help people use their own cars so a very small part of their valuation was physical stuff to upkeep, it was just deals and obligations they had.
Not sure how it shakes out for Anthropic and OpenAI. There’s a lot of physical capacity that needs to be built out and can depreciate. But there’s also a lot of network effects and dependencies being built in with enterprise users.
I don’t know how swappable the tooling is either. I think over the long term the UI, model training and documentation, and infrastructure are going to end up being run by different parties and I’m not sure which leg of that chain ends up in a position to skim most of the profit off. My guess is that Apple and Google end up raking in all the money since they control the OS and app stores while the rest of the stack gets driven down to being generic commodities. At least where mass market consumer adoption is concerned.
> But then you sometimes go and talk to your senior engineering leaders and you’re saying, OK, how many projects that were on the cutting room floor got moved above the line because of the productivity gains because 25% of our code commits were via Claude Code last quarter?
> That link is not there yet, right? I think maybe implicitly there’s more that is getting shipped. But it’s very hard to draw a line between one of those stats and, OK, now we’re actually producing like 25% more useful consumer features, right? And that line is hard to draw.
That's pretty weak sauce. I don't think that justifies the headlines that came out of it, personally.
He also said in that article that what prompted the discussion was the public statement by the Uber CTO that he had already burnt through his organisations yearly AI-budget in April. Please stop this shilling mate, and trying to hide the overall perspective between this or that word.
source: https://isaiprofitable.com/
I am willing to bet a Twix we'll look back on that stuff in 2 years with a lot of embarrassment
It really does have a particular lane for each chore, and it’s reproducible.
I have a few live websites built using LLMs and they will just go for default generic templates and colours if there's no vision.
Let's put it context. Google's annual revenue seems to be north of $400B. So if OpenAI suddenly had Google's revenue, it would still be insufficient to recover their investment.
and it's a ticking time bomb because $1T in servers, CPUs, GPUs and memory is going to be worth $200B in 5 years. You can say they can keep using what they've got. Sure. But they're also not going to stop spending on new hardware. And the competitor that comes along in 5 years and spends $1T doing the exact same thing is going to have a huge advantage.
OpenAI at this point reminds me very much of the Russ Henneman pre-money hype cycle.
and in that sense, if Anthropic and OpenAI are able to create the projection that they can-be profitable despite finances seeming bubbly at best, I think that what happens is that these companies spew so much amount of content that people like Simon get into it too.
There is a deeper problem of people falling into AI psychosis too, in general, I am not sure if Simon has fallen into it or not
I think that the greatest point which can be made here is to not offload your thinking to others and to think about the situation yourself. Sounds familiar (looks like we are all off-loading our thinking itself to machines)
Side-note: As humans, we have a tendency to quickly judge or make quick decisions which stems from our times foraging and scavenging in jungles.
Another Side-note: at a certain point, I am unsure of how much to think about AI or not, certainly discussions about it that were happening 2 years ago weren't helpful in contexts that they are used now (well not in any way or form that a person discussing and getting into the weeds of AI 2 years ago is better than a person just getting into it say 2-3 months ago)
With the industry moving so fast, It is basically unsure to me of any FOMO or anything if you aren't using AI already, I find this notion naive.
People might be making strong opinions (AI psychosis) and skills on the tools available at the moment the same done 2 years ago. We don't quite know about the tech as these are still black-boxes and how they progress and what these "AI skills" might survive or not in future. Heck, we aren't even sure if these tools might survive or not or wouldn't be made magnitudes more expensive simply to break even as they are given to us for the first time at percentages of the price.
I don't know if I should form strong opinions yet and also a question of its worth so much thinking efforts in the first place, probably just gonna do my own thing (the way I want to) which includes learning C at the moment. because learning is fun.
Wait what? They spent 2 order of magnitude less on hardware.
> Gartner forecasts that large AI companies would need to earn cumulatively close to $7 trillion in AI-driven revenue through 2029, which is close to $2 trillion per year by the end of the period. In order to achieve “historic returns,” the providers would need to earn nearly $8.2 trillion in the same period.
Everyone's agency is 100% captured by belief in Wall Street. Too few <50 have any meaningful labor skills to blink.
We'll continue to have consent manufactured via media platforms and in 3 years no one will bat an eye at these companies being worth $12 trillion as Altman and Musk climb two ladders holding a "mission accomplished" banner.
I'm not even sure that 1 in 8 people I know would qualify as a knowledge worker, let alone a knowledge worker that might profoundly benefit from on-the-horizon AI. And I'm in a highly skewed population.
27% of the world's workforce is in agriculture (contrast to the US where it is 1-2%). 15% in manufacturing.
A lot of people work in "services" (especially in high income nations, where it's roughly three quarters) and some of those are knowledge workers... but a huge number of them are nail technicians or hairdressers or bartenders (etc etc).
Basically if you're not doing manual labor, it's probably knowledge work.
Roughly 1/3rd of the working population.
Some data tucked in here: https://gist.github.com/danielmiessler/2dc039762a202b083753b...
How do you know this? Im certainly open to recalibrating my numbers which is why I asked for the source
[1]: Berg, Janine and Gmyrek, Pawel, Automation Hits the Knowledge Worker: ChatGPT and the Future of Work (April 21, 2023). UN Multi-Stakeholder Forum on Science, Technology and Innovation for the SDGs (STI Forum) 2023, Available at SSRN: https://ssrn.com/abstract=4458221
https://www.gartner.com/en/newsroom/press-releases/09-24-201...
> "...with more than four-fifths of that growth coming from the emerging world."
If anyone thinks this is a part of the global TAM that's got $1000 a month to blow, well then I've got a stable of flying unicorns to sell you.
To simplify break that 1B up into 3 levels of purchasing:
1) High-tier (US, Western EU, ANZ, Japan, South Korea, Singapore, UAE, etc) - 200-250M knowledge workers.
2) Mid-tier (Eastern EU, Latin America, urban China, India tech sector, etc) - 300-400M
3) Low-tier (Rest of the world) - 300-400M
Low-tier users are mostly free tier or heavily subsidized pricing.
Mid-tier are going to account for USD sub-$100 tiers. Probably averaging less than $50/seat.
High-tier are who you are assuming is the 1B. Users are not equal in that knowledge worker count, so there aren't 1B knowledge workers to charge money.
And when you consider Low-tier users a majority of those are free users which need to be subsidized by the High-tier users. So either free tiers get much more restrictive or the providers lose additional training data. A bulk of Low-tier users cost money and provide little to no revenue.
Edit: And think about Mid-tier and Low-tier for 5 seconds. Why would they pay Anthropic or OAI when they get get 100x+ inference from DeepSeek or Xiaomi? Mid-tier may be the only area that is willing to spend money on a US provider, but I would wager significantly on the fact that users in the Low-tier almost universally do not care.
Simple - you make them work 2x, 5x, or 10x more hours.
Of course it will. The value of an employee is a multiple of what they get paid.
If you pay an employee $500k and they make $2M for your company (like Meta), then of course a 20% increase for the salary is justified if the velocity is increased 20% as well.
Imagine an employer with 10 employees paying $500k per employee and making $2M per employee in revenue (to use your numbers). They could hire two more employees and spend an extra $1M (+20%), but make an extra $4M in revenue (+20%). Alternatively, they could buy all ten employees a $100k AI subscription, for a total of $1M extra spending (+20%) but an extra $4M in revenue (+20%). You'll notice both scenarios are identical, so an employer optimizing for profit would have no reason to prefer one over the other.
The market is shrinking and saturated already and it’s not because of AI gains but geopolitical instability and supply chain issues, some of which are caused by AI spending and stupid ass PE firms refocusing on AI supply chains.
Only our pensions and futures burning.
My take is the product has been very useful for coding (PMF) for months. But it’s certainly not useful at any cost…
And that's just one inflection point. We've had several and there are many more on the horizon. So while I could be convinced that ROI is maybe not even positive today despite the ridiculous enterprise spend, it's perfectly rational to pave the way today for what's coming over the next few months let alone years down the line.
I think it was clearly useful for months to people who had tried it and taken the time to understand it, but now that knowledge has spread to the point where wallet holders are convinced it's not just passing fad or hype so now pmf can be "claimed".
I agree it's weird to say "those people have pmf" though, usually it's something you define for yourself
52 on AI misuse: https://simonwillison.net/tags/ai-misuse/
149 on the unsolved challenge of prompt injection: https://simonwillison.net/tags/prompt-injection/
40 on slop: https://simonwillison.net/tags/slop/
If you want an "LLM evangelism blog that rarely, if ever, has any critical analysis that isn’t pro-industry" there are plenty out there. I'm not one of them.
No, its more like their own leak to WSJ and according to Ed Zitron -> seems to be heavily engineered via non-GAAP practices such as counting potential, but not realised revenue as actual revenue - the stuff for which I would be arrested if I did it at my company.
Also it appears according to Ed's analysis - strangely they seem to be projecting only that one quarter as profitable - potentially to calm the investors ahead of the IPO. Investor fraud anyone?
It's a funny metric considering Depreciation is a huge cost for them.
"We are profitable when we don't count our expenses"
Back in 2024 their CEO claimed training costs would rise to $10-100B in the next years.
https://www.tomshardware.com/tech-industry/artificial-intell...
In contrast, imagine if we had the same AI 20 years or so ago. Could AI really write Jersey? I guess not as people were still trying to understand JAX-RS. Could AI really answer all the questions about React? I guess not as React was just invented. Would we use 10x fewer people to build out infra on the public cloud or the entire so-called Big Data platforms? I guess not, as they were still rapidly evolving and we'd need so many engineers to explore so many different possibilities? Could we use AI to build our ML ecosystem with 10X fewer people? I highly doubt so. Heck, 20 years ago R was all the rage and Python's ecosystem was not mature at all. Oh, and mobile computing, could AI lead to 10X fewer people to build all the mobile apps and the underlying infra?
“Tokens” don’t have an intrisic cost or value. Saying that I used $2,180.16 worth of tokens is like relying on the salesperson to convince me I’m getting a billion dollars worth of pots and pans for $19.99.
I think it’s funny how we are throwing critical thinking out the window when it comes to evaluating biased sources of info.
I spent $200. If I had been paying API pricing it would have been $2,180.16. The article is about how enterprise customers get charged API pricing, which means if I had been employed by one of those companies I would have cost them $2,180.16.
What am I missing?
We have no market convergence on tokens yet (and it'll differ between LLMs), so it's impossible to say what value you got for your $200.
The point being made above is that API pricing is calculated... somehow... seemingly arbitrarily. Possibly untethered to the infrastructure costs entirely: which would be the basis of any 'value', however that holds the labor theory of value, which isn't accurate either. So how do you accurately price these tokens at all (other than through price-discovery: which is slow, messy and fuzzy)?
Like anything else in the economy: at the point where enough customers can pay you, and not enough will go to the cheaper competition.
Also, to just color in the picture here, as I haven't seen it mentioned elsewhere, there is a very large Saas company at the moment who has given everyone unlimited tokens on Claude. And they have a dashboard showing who spends the most. So the "budget" went from about USD500 per per person (split between Claude and cursor) in Jan to... Well a soft limit of USD100k... Per month... Per person.
People can still see the top line sticker price on their spend, but honestly I can't believe that the Saas is paying that full price when the invoice comes in.
That said, there are some finance reports which are probably dropping soon where we will find out!
I shared that assumption until yesterday, when I found out that it wasn't holding for LLM pricing from OpenAI and Anthropic. That's what inspired me to write this piece.
I think those token leaderboards are an obviously terrible idea and will go extinct very quickly now that people are paying attention to costs.
Could be fantastic for small shops while it lasts. The big guys have to pay 10x for precious tokens.
your point is large players won't pay those prices at massive volume. ok
As with pretty much anything priced on volume/usage.
Enterprise deals are negotiated ad-hoc, the listed pricing is simply a jumping off point for the final negotiated discount.
If you’re going to give 20,000 employees Claude code you are not going to be spending $1B per year on Anthropic tokens as if you gave everyone an individual API key. Just as Anthropic isn’t paying AWS SES $10,000,000 to send 1 email update to their massive user base when the next Claude version drops.
Going to be interesting to determing the metrics we give to engineers for determining whether the spend on this is worth it. Measuring PRs, lines of code committed, commits fully generated by agentic workflows, etc.....
Do you have any numbers or reports to back that up?
edit: I missed the "enterprise" feature matrix with the usual audit/compliance stuff to force the biggest enterprise customers onto enterprise plans. Otherwise the "teams" plan is much better value for any business.
orig-continued:
https://claude.com/pricing/team
Teams premium is "Everything in standard, plus more usage*"
And from my experience, it's a very generous usage, I've only hit the limits once or twice, and both times required multi-boxing agents.
I could single-window agentic development all day on opus-4.7 auto-mode without hitting limits.
If you're a business using claude, then that seems like the right plan, the enteprise/API plan seems more suited to where your product is built on top of the agent themselves, so seats/limits aren't really meaningful?
Yes, value is hard to calculate, but luckily market pricing mechanisms exist exactly for this purpose. There isn't a better number to use than what people are willing to pay for them.
So he's saying that on an enterprise plan, he'd be spending $2,180.16. He's not paying that much, but enterprises are.
A single 3D CAD license pack for the guys in our R&D group costs multiple thousands of dollars per seat, per month.
It's about time software seats get some love too.
[0] https://winchdesign.com/ [1] https://www.superyachts.com/directory/1516/winch-design/flee... [2] https://www.autodesk.com/design-make/articles/naval-architec...
I might agree "AutoCAD" is the current level LLM's are at, but wait until your design departments discovers "Revit", its another ballpark (in wasted cots, engineers on site still get "clashes").
Revit costs are high, and the end results are marginally better - but local LLM's tokens are cheaper 24/7 at "AutoCAD" level - "Revit" level tokens will make Ubers CTO/COO weep harder than they already do. While producing results no better than "Revit" does (engineers still face "clashes").
For a pretty funny commend about pricing.
https://www.reddit.com/r/chipdesign/comments/1ajrli2/cadence...
What does ICP mean?
I don't see the business model working. My closest friend actually does automation software for large companies.
He does not use Claude or openai at all. He primarily uses gpt 120b on cerebras and glm-5.1 for heavy thinking work. And some other small models for various tasks. All open source.
And these systems are extremely useful for the businesses and are able to run fully automated pipelines that are very stable and fast.
We discuss this a lot, and we both think any business doing heavy agentic work on Claude and openai just aren't aware of exactly how good and cheap open source has gotten on the last year.
So... once the legacy businesses and developers catch up, won't Claude and openai be unable to recoup their costs?
Same. It's a nightmare from a Porter's Five Forces perspective.
There will be a ton of businesses competing in this space, and there will be something of a moat due to how capital intensive the business can be, but there will still basically be infinite competitors.
Great for consumers.
I agree with the common trope that open models lag behind by about a year, but something magical happened just around a year ago when the state of the art models became extremely useful. By this reasoning we're about to see open models perform well, but I'm afraid there is more to it than just waiting for another revolution around the sun.
Note, my application is coding assistance. Open models can be great for other purposes.
In latest experiment I used opus for implementation plan then used cursor composer 2.5 for execution.
I must say that combo is really good. Main drawback of claude code is that is super slow. So when paired with composer that is super fast it flies.
Most of the money right now is in coding. Openai and Anthropic just have to be 6 months ahead of SOTA open source models and they'll capture most of the enterprise and dev market
Will this always be true? There will never be an event horizon/point of diminishing returns where something not-bleeding-edge is "good enough" for 51%+ of users?
I highly doubt I'll ever use Claude again.
I think you are wrong about Claude being any significant level better
Oh, hey, I recognize you. Thank you for the very forward and thorough orbital sander recommendation at Home Depot. That's exactly what I wanted to deal with on my holiday weekend. You just know so much about this and the rest of us are simple passersbys.
Unless ofc there was an actual speed difference, only reason I'd be willing to go with a worse model couple of percent worse than current best model is if the speed was at least 5x higher. Looking forward to kimi k2.6 offered publicly by Cerebras
That's fine. Other people may not want to pay 300 more and will rather make do with last year's SOTA.
> For coding you always want to go with the best model
Maybe you meant "For coding I always want to go with the best model"?
And also, people have it wrong… their models are not the main problem anymore. It’s the RAG
Anthropic and OpenAI have shown people want a tool for task offloading, driving predictable token consumption and justifying the math, so long as users stay in that dynamic.
However, knowledge workers using these tools daily are getting exhausted with them. Outputs come out polished but hollow. Talking to a frictionless, frame-completing model all day drains you.
If user behavior drifts away from assistant usage because of that, per-token math implodes. The valuations we're hearing about all the time rely on usage compounding daily. The fatigue is a timer running against that compound.
Anthropic's Constitution is the closest hedge out there, I think. Installing an identity structure into the model through training. But it's still assistant-first, so the fix there is only partial.
I've spent the last year running a product that flips the architecture so identity is primary and the assistant role is secondary. Same frontier models, completely different conversational quality. The fatigue property doesn't really show up.
Whichever labs figure out how to install real identity natively in the weights are going to be the ones with PMF in the next phase.
The assumption here is that this is a positive thing.
But this very well could end up being a major negative long term by increasing the cost per user, reducing margins.
More usage = more cost = less profit.
It's not obvious that more usage is good. It's only good if revenue per user increases more than cost does. I'm skeptical about that.
> Stories are circulating of companies surprised at how expensive their LLM bills are becoming from usage by their staff
> Enterprise customers are now paying API prices
How long before enterprise customers start to question the bill? Anthropic goes from not making money to doing pricing shakeup, and now they are making money and the biggest spenders are shocked at prices.
Seems like things are still very uncertain.
But memory costs are going way up. And both OpenAI and Anthropic bumped up the price of their frontier models in April.
Ahhh the classic startup term that's definition is nebulous. But also, since when does any definition of product/market fit mean a product is profitable? And profitable in what sense? Unit economics? Overall company?
It's a great hook to build an article around. My core point is more that April 2026 was the point when Anthropic and OpenAI finally appeared to have figured out a credible business model.
So many startups trying to automate sales, but somehow the two biggest frontier labs have decided that the best GTM strategy is firmly human-in-the-loop.
Agreed. But its only a great deal because it is heavily subsidized, as you said yourself. Enjoy while it lasts, but in my book, product-market fit means something along the lines of "product which enjoys a loyal customer base, sold at a price perceived fair by the customers, and generating profit. How many of these does your definition of product-market fit hit here?
I've been calling that out for a couple years now. LLMs best and most viable use case is still just as a dev tool. Even for non-programming tasks, I still get better results from the LLM if I instruct it to write code to do the task...look at Claude Cowork for example, it's everything I used to do with python myself. It's not really a novel capability, it's just using python & bash for automations that any sysadmin has been doing for decades. Yeah, that's valuable for a non-techincal audience but is it $1T valuable? I don't think so.
When has an IDE or other dev tool ever commanded a $1T valuation?
These things get lost in discussions because people conflate "overvalued" with "not useful." LLMs are useful, particularly as dev tool, but Anthropic & OpenAI are definitely way overvalued.
Operating profit is both post depreciation and fees paid to third parties for hire. So aside from shenanigans like RSUs and financing interest that's already somewhat close to actual economics.
Meanwhile we've got commenters here talking of 5-10 trillion with a T revenue shortfall.
Those are very different takes on reality
How many tokens is that, input/output-wise?
(a) I'm curious if you feel like you got $2000 worth of value out of them in the last month?
(b) I'm also curious if you would have gotten similar quality out of a slightly lower-cost provider of an open-weight model? (e.g. Kimi K2.6 and DeepSeek v4 Pro) and what the spend would have been for that.
I myself have managed to spend not quite $4 on OpenRouter and have felt it was very worth it; I just have much smaller, or more targeted requests I guess. (Lately, adding features to a static site generator in Python, or setting up log forwarding via a docker compose file)
Input tokens: 52,545,485
Output tokens: 5,767,253
Cache create tokens: 5,112,029
Cache read tokens: 1,475,069,465
Total tokens: 1,538,494,232
Total cost: $1,199.79
OpenAI Codex: Input tokens: 52,598,013
Output tokens: 4,681,867
Reasoning output: 2,091,063
Cached input tokens: 1,153,844,864
Total tokens: 1,211,124,744
Total cost: $980.37
I'm confident I got value out of OpenAI - I've been mainly on Codex for the last few weeks.Not so sure I got that value from Claude, just because I've been using it a lot less and somehow the price came to about the same as OpenAI.
Given the code I've been able to build in the past month I genuinely do think I got value for the API price version, and (don't tell OpenAI or Anthropic) I think I'd have paid full price.
I've not spent nearly enough time with GLM-5.1 and co to compare, but I do know that the prompts I'm using with the agents are not prompts I would have expected to work just three months ago.
When I account for the amount of time it saved me there's no question $2,000 was worth it.
Personally, I've probably spent $60 or so on OpenRouter in the last month or so and got a working project out of it that it would probably have taken me a fortnight to knock together (which is inevitably an under-estimate because it covered things I'd have to learn but K2.5/6 already knew). There's an orders-of-magnitude gap there.
Many of us are either openly having our performance reviews tied to AI use, especially at larger enterprises. Whether that's measured by sheer token count or just "how many of your tasks are you using AI for these days" (combined with the implication that question carries at many orgs which are heavily invested in AI).
I don't think that's the case. I think the token leaderboard thing (which is clearly ridiculous) affects a tiny portion of companies and is already going out of fashion.
You may want to get one of them to check the math on that :p
You think this is fantastic deal only because they use similar like tricks where they inflate the price and tell you something supposed to cost $1000 but they have this today promo for $100.
I was there too and paying for a while. Few weeks ago I tried DeepSeek V4 Pro - expected its gonna be shit but its actually pretty good.
The deal is I pay daily ~$1 for DSV4-pro for ~100M API token usage. And they probably not getting broke because >90% of those token in practice is cache read and they very well optimized for that.
The impact of AI in other fields seems to be muted.
Software development has the huge advantage that mistakes and hallucinations are very easy to spot: the software works or it doesn't.
Spotting errors in a research report or legal brief is a whole lot harder!
But... non-software professionals spend a huge amount of their time on tasks that can be safely automated - reformatting documents, extracting numbers from PDFs, all kinds of flavor of data entry.
Learning how to use a tool like Claude Cowork can take a big dent out of those.
However the valuations are still far far away from actual sanity
I use glm-5.1 and occasionally deep seek v4.
They are as good or better than Claude's latest models.
And significantly cheaper. I've converted 3 of my engineer friends as well. All three have dropped their $200 month plans they had with anthropic.
We've all been a bit shocked at just how good these models are now.
If you "have" tried GLM (I specifically find it shockingly good for code). Did you not think it's not competitive to Claude, and why?
It's good enough for personal stuff. It doesn't compare to the latest Opus I use at work. You can certainly argue I don't need Opus for work, but there is clearly a difference.
Also, at least with z.ai, GLM-5.1 is s l o w! After using Claude at work, I get really impatient with GLM-5.1 at home. When doing "true" vibe coding (i.e. not really examining the code), Opus is a ton faster (easily 5x).
But yeah, I'm not willing to personally pay for the frontier models. I won't even renew my annual Z.ai plan - it's become too expensive.
Also, and I know you may not want to answer. But could you give me an idea of the type of thing you found glm to be worse with?
I think I've been fairly unbiased in testing a bunch of different development tasks. But am curious if maybe it performs well for some stuff and not others. So if you could share what you feel it's worse at.
Also are you an experienced developer or less experience?
When DeepSeek V4 Pro came out, I had been mostly coding with GLM-5.1 on a Z.ai coding plan.
I had a large analysis task on a relatively complex codebase. I decided to try the models out.
GLM-5.1 did acceptably but got a few things wrong (easily corrected) and took quite a while to get there.
Opus 4.6 burnt through the US$10 budget I had given it in about 10-15 min, without ever returning from the first prompt.
DeepSeek V4 returned a full analysis within 2-3 min, and I carried on all the way to implementing the feature I was after. Total cost less than US$1.00.
I now mostly alternate between GLM-5.1 and DeepSeek V4 Flash, with an occasional dip into V4 Pro for more complex analyses.
right now everyone is using latest and greatest to do dumb stuff like that. that would change fast if companies start caring about costs.
Any org with more than 150 users aren't on $200/month plans, they are forced into API pricing + $20/month/user
For individuals and orgs small enough to get to use the subscription plans, that's all well and good until usage limits keep going down, or cost goes up. If you compare the usage you get on $200/month maxed out vs. what that would cost at API pricing, the $200/mont plan is an absolute steal. I doubt it will last long.
On the plus side, I'm happy I'll have a nice hay barn when the local half-built AI data center is abandoned.
Recent conversation here on that topic: https://news.ycombinator.com/item?id=47062534#47063134
But that's the point of the article. Enterprise plans are starting to get API pricing, not the subsidized subscription pricing.
There's a whole bag of clever tricks you can play to juice short term results leading to an IPO that may not work longer term.
I'll believe they've found product-market fit when they have a product. Right now they're selling the infrastructure, in a highly subsidized and undifferentiated way (at least over a sufficient long period of time of, say, a couple of years).
I notice this all over the place. Many people hate AI and want it to fail, and they're willing to invent misinformation if it supports that idea.
In hype-driven markets, you cannot be certain of that.
Let's take a view that the author is right: coding agents and their associated harnesses were the inflection point for some degree of profitability and widespread consumption, and that these tools are now yet another SaaS subscription or API bucket expense to bake into every single developer (or developer-adjacent) in the organization alongside your collab suite, HR seat, CRM seat, design seat, etc. To be fair I honestly think that's a safe assumption to make for highly technical firms whose image is derived from remaining on the cutting edge of things.
That begs the following questions, which we won't know until IPOs start happening:
* Are subscriptions profitable, or just API consumption?
* What's the run rate when we just consider subscription-based usage like Claude Code and Codex? What about API calls?
* Is there any profitable pathway forward at which enterprises can get unlimited usage but at fixed rates via subscription?
* What does customer churn look like for subscription users versus API users?
We also have a number of questions for customers that I suspect we'll start seeing receipts for in the coming months, at least from the early adopters:
* What was the net gain (loss) from leveraging coding agents?
* What's the cost of a developer with or without access to a coding agent + harness? Is it cheaper to hire an outsourced worker with a coding agent subscription, or a domestic worker without one?
* At what point does further AI spend result in diminishing returns, i.e. where's the 'sweet spot' for spend?
* Did AI boost actual revenue and outcomes, or did it just gamify KPIs?
* What roles or work did AI actually replace, versus merely displace during the hype cycle?
Not to mention the questions regarding the technology itself:
* Will we develop the means to run foundational/frontier models at edge using less resources through some existing (e.g. distillation) or new technology, thus cutting off the profit centers of these firms?
* When the market mismatch between supply and demand is resolved, won't it be more affordable for consumers and companies to operate their own AI infrastructure rather than support further centralized buildouts?
* Will coding agents improve to the point of being able to bootstrap and self-orchestrate on edge/consumer hardware without substantial technical expertise, or at least improve to the point that traditional IT teams can securely operate them internally without an expensive subscription or API token bucket?
All of these will influence the long tail of this bubble, because it is a bubble at this point. Even if these companies are indeed profitable thanks to the coding agent inflection point, there's still so many unanswered questions about utility beyond coding that it's impossible to extrapolate a future. If coding agents are indeed the extent of utility for profitability, then there's no possible way these entities will recoup the investment already sunk into their infrastructure buildouts. Even if more profitable uses are discovered, does this offset or replace the firms disappearing due to AI speculation and their associated contributions to the economy as a whole (RE: the consumer compute industry at present, higher energy costs due to datacenter builds, opportunity cost from harms to local infrastructure from haphazard builds, etc)? Should these firms indeed be runaway successes and immensely profitable to the point of paying off their investors and growing the larger economy, does this end up stifling innovation in a world where most new ideas are fed into LLMs for R&D that are then controlled by only a handful of companies and immensely wealthy people, via systems that are easily surveilled and stolen from without recourse?
So many, many questions yet to be answered. Betting the farm because of coding agents is one hell of a gamble.
Is that quarter same as any other quarter in terms of infrastructure costs (e.g. are there any temporary discounts happening coincidentally)?