They have to keep getting better to stay ahead of each other and open weight.
Which means it's the opposite of a timebomb, the article has it completely backwards, tokens at current level of reasoning will continue to get cheaper.
Extended discussion on this topic:
https://corecursive.com/the-pre-training-wall-and-the-treadm...
One can at least hope.
Github Copilot moves to usage-based billing in two weeks.[1]
1. https://github.blog/news-insights/company-news/github-copilo...
LLMs are just parroting relevant documents they've assimilated.
They maybe running at loss after all the salaries and stock comp, but tokens are in profit now.
Yes, sure, right now it is ... but that's NOT how it got here.
There are trillions invested to recoup and at most billions in sales. It doesn't add up to tokens making a profit any time soon.
But if all the AI companies stopped training new models, they would all instantly become profitable (and stick around)
The thing that makes them unprofitable, is having to compete (which means training models). If / when enough companies exit the market, the cost to compete goes down and you end up in an equilibrium
Eh, the AI companies still have lots of datacentres. For the guys who funded with equity, they could collapse down to just running those as utilities. (For the guys who funded with debt, they'd have to restructure.)
From the customer's perspective, this situation shouldn't result in a cost spike. (Consolidation, on the other hand, would. But that's a separate argument from the one the article attemptes to make.)
But if there's no more competition, there's no more incentive to keep prices low, which will also be reflected in pricing.
But this isn't "a ticking time bomb for enterprise." It's an issue for the AI companies' investors.
It's like selling dope, once they're addicted, a dealer could turn the screw on them
If things don't end up working out a lot of people have already been (and in the future will be) paid. It's the investors that will lose out, not the subscriber.
It’s unlikely that Claude is proportionally that bigger and more expensive to serve so profit margins on inference must be pretty decent
Even if they are “profitable” how many Uber drivers are “profitable” because they aren’t correctly calculating asset depreciation. Maybe these guys are doing the same thing.
Maybe it’s a lot of people who already had GPUs for crypto mining, and they’ve moved over to this, so that if they need to grow and buy new GPUs the costs would dramatically grow.
How many times bigger could Opus be than GLM or Kimi, it’s certainly not proportional to the price
it’s highly unlikely OpenAI/Anthropic are not making decent amounts of money from inference.
Based on what? Why are we all whispering about how profitable all this is? It is the absolute last thing these firms would keep secret.Nobody is whispering about anything. Everyone is loudly assuming what's convenient for their thesis. Even if you have access to the books, the accounting isn't straightforward–there are yet insufficient data for a meaningful answer.
> It is the absolute last thing these firms would keep secret
If you find an optimisation strategy that you don't think your competitors have, you absolutely keep your margins secret for as long as possible. Knowing something is possible is the first step to making it so.
You can also do everything metered. There are multiple ways to buy.
--You lose control over their "salary"
--You lose control over their "schedule"
--Your company becomes reliant on another party that does not share your interests or values, and can stop working for you on a whim for any reason
But AI is definitely good and trade unions are definitely bad, apparently...
Perhaps OpenRouter can be used as a benchmark for commodity cost to serve AI. I keep hearing it's better value than Claude, which suggests to me that either Anthropic is especially inefficient for some reason, or they're turning a profit on inference. They could be losing money on training, but I suspect that's just part of the cost of staying a leading lab. If any single one goes under due to debt etc. then companies can just switch?
Many companies use models deployed on Azure/Bedrock etc are already paying based on usage (often with discounts).
Remember that enthusiasts leaning on API keys and large enterprises are the exception, not the norm, and even some large customers may lean on subscriptions for at-scale adoption and wait for teams to report hitting usage caps before buying more token buckets. Subscriptions are predictable, reliable, and above all else a contractable way to acquire service.
Truth be told, this has been my red flag in orgs and with peers elsewhere for several years, now. Those orgs leaning on subscriptions are in for a nasty surprise within a year or two (like the author, I predict sooner than later), especially if those subscriptions power internal processes instead of AI buckets.
Hell, this is why I think there’s a sudden focus on the “Forward Deployed Engineer” nonsense role: helping organizations migrate from subscriptions to token buckets for processes so the bill shock doesn’t send them running away screaming.
The best course of action is to take advantage of subsidy for awhile, but not integrate is so deeply one can’t retreat. You’ll still have full productivity, just be cognizant of the reality of the situation.
Hopefully the market eventually collapses to where companies are hosting their own inference, and you simply lease a model package to run on your own (or rented ) specialty hardware.
If you increase the price, the value is still astronomical in comparison.
Companies need to find a way to leverage local models in tandem with frontier models to offset the costs.
It’s all about targeting specific workloads with the appropriate AI. These tools are not sentient beings they are tools that need to be properly configured to match the job at hand.
What? Anthropic's costs aren't the API rate. The article never attempts to estimate that cost, which renders its thesis tautology.
1. Training is expensive. Not just compute but getting the data, researchers salaries etc 2. You have to keep producing new models to ensure people use your inference and there seems to be no end to this. So they have to pour more billions to keep the cycle going on 3. People salary and other admin cost are not that high compared to 1 and 2.
The article's point is that if you're relying on flat fee subscriptions, a rude awakening may be coming. That seems plausible to me. Issues around token quotas are a frequent topic on HN.
Nobody is going to charge "inference price" for model usage.
Not necessarily. Many factors go into what models are available at enterprise level. If you look around, not many companies (everywhere around the world) use DeepSeek models even though they are significantly cheaper.
Think what you want but even when hosted in the US, at the enterprise level going all in on that would be a legal and/or political death sentence.
We need better open source/cheap but high intelligence western models that are proven to work well in agent if tooling and have strong legal agreements for enterprise to even consider it.
* People keep finding ways of cramming more intelligence into smaller models, meaning that a given hardware spec delivers more model capability over time. I remember not that long ago when cutting edge 70B parameter models could kinda-sorta-sometimes write code that worked. Versus today, when Qwen 27BA3B (1/23 of the active parameters!) is actually *fun* to vibe code with in a good harness. It’s not opus smart, but the point is you don’t need a trillion parameters to do useful things.
* Hardware will continue to improve and supply will catch up to demand, meaning that a dollar will deliver more hardware spec over time. Right now the industry is massively supply constrained, but I don’t see any reason that has to continue forever. Every vendor knows that memory quality and memory bandwidth and the new metrics of note, and I expect to start seeing products that reflect that in a few years.
I hope that one day we’ll look back on the current model of “accessing AI through provider APIs” the same way we now look back on “everyone connecting to the company mainframe.”
As the AI labs become more reliant on enterprise adoption, it makes sense to push capabilities at a cost that makes sense for businesses. Even if it prices out consumers or hobbyists.
Between: more efficient models - tuned for the task at hand, the ability to run those models in-house, or even at the edges, plus Google and Microsoft are well positioned to stay ambivalent as they’ve got lots of products to sell and whether or not LLMs are part of the portfolio mix is completely dependent on enterprise customer demand.
Anthropic/OpenAI have a number of aggressive downward pressures on their pricing.
We all know every frontier AI lab is heavily subsidizing usage, and so do all of the VCs & CEOs funding them.
1. GenAI companies are making a loss in order to gain adoption and later lock-in
2. ???
3. They're going to cash-in soon and start milking you now that business critical systems rely on GenAI
The "???" denotes a complete failure to offer compelling arguments that link 1 and 3.
But also... is this shit AI written? I'm so tired of this.
Who said it was?
> Pull out the napkin. This matters.
The article wouldn't exist if you didn't think it mattered, just tell us why.
> the question is not whether they got a good deal. The question is
Who said that was the question?
> This Is Not One Company's Problem
Who said it was?
Stop telling us what thing aren't, just speak like a normal human and convey your own thoughts. It's an insult to your audience to throw constant AI slop at them.
> thousands of companies have woven AI subscriptions deep into their operations. Marketing teams draft copy through ChatGPT Plus.
Yea I bet you do..