If you compute out the MFU the author gets it's 1.44 million input tokens per second * 37 billion active params * 2 (FMA) / 8 [GPUs per instance] = 13 Petaflops per second. That's approximately 7x absolutely peak FLOPS on the hardware. Obviously, that's impossible.
There's many other issues with this article, such as assuming only 32 concurrent requests(?), only 8 GPUs per instance as opposed to the more efficient/standard prefill-decode disagg setups, assuming that attention computation is the main thing that makes models compute-bound, etc. It's a bit of an indictment of HN's understanding of LLMs that most people are bringing up issues with the article that aren't any of the fundamental misunderstandings here.
https://lmsys.org/blog/2025-05-05-large-scale-ep/
This has gotten significantly cheaper yet with additional code hacks since then, and with using the B200s.
Even rerunning the math on my use cases with way higher input token cost doesn't change much though.
The component about requiring long context lengths to be compute-bound for attention is also quite misleading.
Now, when he said that, his CFO corrected him and said they aren't profitable, but said "it's close".
Take that with a grain of salt, but thats a conversation from one of the big AI companies that is only a few weeks old. I suspect that it is pretty accurate that pricing is currently reasonable if you ignore training. But training is very expensive and the reason most AI companies are losing money right now.
They sure have a lot of training to do between now and whenever that happens. Rolling back from 5 to whatever was before it is their own admission of this fact.
for their investors, however, they are promising a revolution
which is completely "normal" at this point, """right"""? if you have billions of VC money chasing returns there's no time to sit around, it's all in, the hype train doesn't wait for bootstrapping profitability. and of course with these gargantuan valuations and mandatory YoY growth numbers, there is no way they are not fucking with the unit economy numbers too. (biases are hard to beat, especially if there's not much conscious effort to do so.)
If you understand there are multiple models from multiple providers, some of those models are better at certain things than others, and how you can get those models to complete your tasks, you are in the top 1% (probably less) of LLM users.
This is almost surely wrong but my point was about GPT5 level models in general not GPT5 specifically...
But at some point, model improvement will saturate (perhaps it already has). At that point, model architecture could be frozen, and the only purpose of additional training would be to bake new knowledge into existing models. It's unclear if this would require retraining the model from scratch, or simply fine-tuning existing pre-trained weights on a new training corpus. If the former, AI companies are dead in the water, barring a breakthrough in dramatically reducing training costs. If the latter, assuming the cost of fine-tuning is a fraction of the cost of training from scratch, the low cost of inference does indeed make a bullish case for these companies.
On the other hand, this may also turn into cost effective methods such as model distillation and spot training of large companies (similarly to Deepseek). This would erode the comparative advantage of Anthropic and OpenAI, and result in a pure value-add play for integration with data sources and features such as SSO.
It isn't clear to me that a slowing of retraining will result in advantages to incumbents if model quality cannot be readily distinguished by end-users.
I like to think this is the end of software moats. You can simply call a foundation model company's API enough times and distill their model.
It's like downloading a car.
Distribution still matters, of course.
TBH I don't take anyone seriously unless they are talking about cash flows (FCFF or FCFE specifically).
Who cares about expense classification - show me the money!
For others, I think the picture is different. When we ran benchmarks on DeepSeek-R1 on 8x H200 SXM using vLLM, we got up to 12K total tok/s (concurrency 200, input:output ratio of 6:1). If you're spiking up 100-200K tok/s, you need a lot of GPUs for that. Then, the GPUs sit idle most of the time.
I'll read the blog post in more detail, but I don't think the following assumptions hold outside of AI labs.
* 100% utilization (no spikes, balanced usage between day/night or weekdays) * Input processing is free (~$0.001 per million tokens) * DeepSeek fits into H100 cards in a way that network isn't the bottleneck
Whether they flow through COGS/COR or elsewhere on the income statement, they've gotta be recognized. In which case, either you have low gross margins or low operating profit (low net income??). Right?
That said, I just can't conceive of a way that training costs are not hitting gross margins. Be it IFRS/GAAP etc., training is 1) directly attributable to the production of the service sold, 2) is not SG&A, financing, or abnormal cost, and thus 3) only makes sense to match to revenue.
Can anyone explain why it's not allowed to compensate the creators of the data?
Another similar example is R&D and development by engineers aren't considered in margin either.
Back of the envelope: $25k GPU amortized over 5 years is $5k/year. A 500W GPU run at full power uses 4.5MWh; at $0.15/kWh the electricity costs $650/year.
The other operating costs you suggest have to be even smaller.
Are you saying that the operating costs for inference exceed the costs of training?
= $ 10,000,000 C T
=$10,000,000.
Each query costs
= $ 0.002 C I
=$0.002.
Break-even:
> 10,000,000 0.002 = 5,000,000,000
inferences N> 0.002 10,000,000
=5,000,000,000inferences
So after 5 billion queries, inference costs surpass the training cost.
Openai claims it has 100 million users x queries = I let you judge.
> Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.
"If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.
What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."
https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...
Also, in Nike's case, as they grow they get better at making more shoes for cheaper. LLM model providers tell us that every new model (shoe) costs multiples more than the last one to develop. If they make 2x revenue on training, like he's said, to be profitable they have to either double prices or double users every year, or stop making new models.
A better metaphor would be oil and gas production, where existing oil and gas fields are either already finished (i.e. model is no longer SOTA -- no longer making a return on investment) or currently producing (SOTA inference -- making a return on investment). The key similarity with AI is new oil and gas fields are increasingly expensive to bring online because they are harder to make economical than the first ones we stumbled across bubbling up in the desert, and that's even with technological innovation. That is to say, the low hanging fruit is long gone.
This largely was the case in software in the '80s-'10s (when versions largely disappeared) and still is the case in hardware. iPhone 17 will certainly cost far more to develop than did iPhone 10 or 5. iPhone 5 cost far more than 3G, etc.
You could see here: https://www.reddit.com/r/dataisbeautiful/comments/16dr1kb/oc...
new ones are generally cheaper if adjusted for inflation. This is a sale price, but assuming that margins stay the same it should reflect the manufacturing price. And from what I remember about apple earnings their margins increased over time, so it means the new phones are even cheaper. Which kind of makes sense.
Recent iPhones use Apple's own custom silicon for a number of components, and are generally vastly more complex. The estimates I have seen for iPhone 1 development range from $150 million to $2.5 billion. Even adjusting for inflation, a current iPhone generation costs more than the older versions.
And it absolutely makes sense for Apple to spend more in total to develop successive generations, because they have less overall product risk and larger scale to recoup.
If you don't like "model as company," how about "model as making a movie?" Any given movie could be profitable or not. It's not necessarily the case that movie budgets always get bigger or that an increased budget is what you need to attract an audience.
This is clearly the case for models as well. Training and serving inference for GPT4 level models is probably > 100x cheaper than they used to be. Nike has been making Jordan 1's for 40+ years! OpenAI would be incredibly profitable if they could live off the profit from improved inference efficiency on a GPT4 level model!
>>OpenAI would be incredibly profitable if they could live off the profit from improved inference efficiency on a GPT4 level model!
If gpt4 was basically free money at this point it's real weird that their first instinct was to cut it off after gpt5
People find the UX of choosing a model very confusing, the idea with 5 is that it would route things appropriately and so eliminate this confusion. That was the motivation for removing 4. But people were upset enough that they decided to bring it back for a while, at least.
Model as a product is the reality, but each model competes with previous models and is only successful if it's both more cost effective, and also more effective in general at its tasks. By the time you get to model Z, you'll never use model A for any task as the model lineage cannibalizes sales of itself.
each node is much more expensive to design for, but when you finally have it you basically print money.
and of course you always have to develop next more powerful and power efficient CPU to keep competitive
In other words, its possible this story is correct and true for Anthropic, but not true for OpenAI.
However, at the same time, I was using Claude much less, really preferring the answers from it most of the time, and constantly being hit with limits. So guess what I did. I cancelled my OpenAI subscription and moved to Anthropic. Not only do i get Claude Code, which OpenAI really has no serious competitor for.
I still use both models but never run into problems with OpenAI, so i see no reason to pay for it.
Thats a moat, albeit one that is slow to build.
You'd think maybe the CEO might be able to give a ball park on the profit made off that 2023 model.
ETA: "You paid $100 million... There's some cost to inference with the model, but let's just assume ... that even if you add those two up, you're kind of in a good state."
You see this right? He literally says that if you assume revenue exceeds costs then it's profitable. He doesn't actually say that it does though.
> ICYMI, Amodei said the same
No. He says that even paying for training a model is profitable. It makes more revenue that it costs - all things considered. A much stronger claim.
Basically each new company puts competitive pressure on the previous company, and together they compress margins.
They are racing themselves to the bottom. I imagine they know this and bet on AGI primacy.
Just like Uber and Tesla are betting on self driving cars. I think it's been 10 years now ("any minute now").
https://www.reuters.com/legal/government/anthropics-surprise...
However this does not work as well if your fixed (non-unit) cost is growing exponentially. You can’t get out of this unless your user base grows exponentially or the customer value (and price) per user grows exponentially.
I think this is what Altman is saying - this is an unusual situation: unit economy is positive but fixed costs are exploding faster than economy if scale can absorb it.
You can say it’s splitting hair, but insightful perspective often requires teasing things apart.
They have to build the next model, or else people will go to someone else.
Our software house spends a lot on R&D sure, but we're still incredibly profitable all the same. If OpenAI is in a position where they effectively have to stop iterating the product to be profitable, I wouldn't call that a very good place to be when you're on the verge of having several hundred billion in debt.
We know that businesses with tight network effects can grow to about 2 trillion in valuation.
IE OpenAI invests in Cursor/Windsurf/Startups that give away credits to users and make heavy use of inference API. Money flows back to OpenAI then OpenAI sends it back to those companies via credits/investment $.
It's even more circular in this case because nvidia is also funding companies that generate significant inference.
It'll be quite difficult to figure out whether it's actually profitable until the new investment dollars start to dry up.
OpenAI's fund is ~$250-300mm Nvidia reportedly invested $1b last year - still way less than Open AI revenue
That is an openai skeptic. His research if correct says not only is openai unprofitable but it likely never will be. Can't be ,its various finance ratios make early uber, amazon ect look downright fiscally frugal.
He is not a tech person for what that means to you.
https://bsky.app/profile/davidcrespo.bsky.social/post/3lxale...
https://bsky.app/profile/davidcrespo.bsky.social/post/3lo22k...
https://bsky.app/profile/davidcrespo.bsky.social/post/3lwhhz...
https://bsky.app/profile/davidcrespo.bsky.social/post/3lv2dx...
It's not responsive at all to Zitron's point. Zitron's broader contention is that AI tools are not profitable because the cost of AI use is too high for users to justify spending money on the output, given the quality of output. And furthermore, he argues that this basic fact is being obscured by lots of shell games around numbers to hide the basic cash flow issue. For example, focusing on cost in terms of cost per token rather than cost per task. And finally, there's an implicit assumption that the AI just isn't getting tremendously better, as might be exemplified by... burning twice as money tokens on the task in the hopes the quality goes up.
And in that context, the response is "Aha, he admits that there is a knob to trade off cost and quality! Entire argument debunked!" The existence of a cost-quality tradeoff doesn't speak to whether or not that line will intersect the quality-value tradeoff. I grant that a lot turns on how good you think AI is and/or will shortly be, and Zitron is definitely a pessimist there.
Ed doesn’t really make that argument anymore. The more recent form of the point is: yes, clearly people are willing to pay for it, but only because the providers are burning VC money to sell it below cost. If sold at a profit, customers would no longer find it worth it. But that’s completely different from what you’re saying. And I also think that’s not true, for a few reasons: mostly that selling near cost is the simplest explanation for the similarity of prices between providers. And now recently we have both Altman and Amodei saying their companies are selling inference at a profit.
The link you posted: I think it is very plausible that it will be hard for OpenAI to become profitable
He is not wrong about everything. For example, after Sam Altman said in January that OpenAI would introduce a model picker, Zitron was able to predict in March that OpenAI would introduce a model picker. And he was right about that.
Uber burnt through a lot of money and even now I'm not sure their lifetime revenue is positive (it's possible that since their foundation they've lost more money than they've made).
They aren't yet profitable even just on inference, and its possible Sam didn't know that until very recently.
[1] https://www.nytimes.com/2025/08/22/podcasts/is-this-an-ai-bu...
and
“Inference revenue significantly exceeds inference costs.”
are not incompatible statements.
So maybe only the first part of Sam’s comment was correct.
All of these alternatives means different things when you say it takes +20 seconds for a full response.
I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?
The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.
Am I missing something?
It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.
There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.
> The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
There are. Basically every provider's R1 prices are cheaper than estimated by this article.
The article estinates $0.003 per million input tokens, the cheapest on the list is $0.46 per million. The ratio is 120×, not 460×.
OTOH, all of the providers are far below the estimated $3.08 cost per million output tokens
if the margins on hosted inference are 80%, then you need > 20% utilization of whatever you build for yourself for this to be less costly to you (on margin).
i self-host open weight models (please: deepseek et al aren't open _source_) on whatever $300 GPU i bought a few years ago, but if it outputs 2 tokens/sec then i'm waiting 10 minutes for most results. if i want results in 10s instead of 10m, i'll be paying $30000 instead. if i'm prompting it 100 times during the day, then it's idle 99% of the time.
coordinating a group buy for that $30000 GPU and sharing that across 100 people probably makes more sense than either arrangement in the previous paragraph. for now, that's a big component of what model providers, uh, provide.
What I'm trying to say is that hosting your own model is in an entierly different leauge than the pros.
If we account for error in article implies higher cost I would argue it would return back to profit directly because how advanced optimization of infer3nce has become.
If actual model intelligence is not a moat (looking likely this is true) the real sauce of profitable AI companies is advanced optimizations across the entire stack.
Openai is NEVER going to release their specialized kernels, routing algos, quanitizations or model comilation methods. These are all really hard and really specific.
Afaik openai doesn't enforce a daily quota even on the $20 plans unless the platform is under pressure.
Since I often consume 20M token per day, one can assume many would use far more than the 1M tokens assumed in the article's calculations.
It is very likely that you are in the top 10% of users.
I somewhat doubt my usage is so close to the edge of the curve since I don't even pay for any plan. It could be that I'm very frugal with money and fat on consumption while most are more balanced, but 1M token per day in any case sounds slim for any user who pays for the service.
Deepseek R1 for free.
But these companies also have very expensive R&D development and large upfront costs.
They are dirt cheap. Same model architecture for the comparison: $0.30/M $1.00/M. Or even $0.20-$0.80 from another provider.
> But compute becomes the bottleneck in certain scenarios. With long context sequences, attention computation scales quadratically with sequence length.
Even if the statement about quadratically scales is right, the bottleneck we are talking about is somewhere north by factor 1000. If 10k cores do only simple matrix operations each needs to have new data (up to 64k) available every 500 cycles (let's say). Getting these amount of data (without _any_ collision) means something like 100+GByte/s per core. Even 2+TByte/s on HBM means the bottleneck is the memory transfer rate, by something like 500 times. With collision, we talk about an additional factor like 5000 (last time I've done some tests with a 4090).
GPU MMUs can handle multiple line in parallel. But not 10k cores at the same time. The HBM is not able to transfer 3.5TByte sequencial.
https://www.wheresyoured.at/deep-impact/
Basically, DeepSeek is _very_ efficient at inference, and that was the whole reason why it shook the industry when it was released.
Given Gemini efficiency with long context I would bet their attention is very efficient too.
GPT OSS uses fp4, which DeepSeek doesn’t use yet btw.
So no, big labs aren’t behind DeepSeek in efficiency. Not by much at least.
We also don't know the per-token cost for OpenAI and Anthropic models, but I would be highly surprised if it was significantly more expensive than open models anyone can use and run themselves. It's not like they're also not investing in inference research.
Seriously, that claim was always completely disingenuous
And when you're using an actual AI model to "train" (copy), it's not even a shred of nonsense to realize the prior model is a core component of the training.
I remember seeing lots of videos at the time explaining the details, but basically it came down to the kind of hardware-aware programming that used to be very common. (Although they took it to the next level by using undocumented behavior to their advantage.)
they did reduce both though and mostly due to reduced precision
In any case, here is what Anthropic CEO Dario Amodei said about DeepSeek:
"DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)"
"DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese."
https://www.darioamodei.com/post/on-deepseek-and-export-cont...
We certainly don't have to take his word for it, but the claim is that DeepSeek's models are not much more efficient to train or inference than closed models of comparable quality. Furthermore, both Amodei and Sam Altman have recently claimed that inference is profitable:
Amodei: "If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.
What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."
https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...
Altman: "If we didn’t pay for training, we’d be a very profitable company."
https://www.theverge.com/command-line-newsletter/759897/sam-...
The first statement is one about the present value of AI. The second statement is about their belief of the future value of AI.
"There is nothing else after generative AI. There are no other hypergrowth markets left in tech. SaaS companies are out of things to upsell. Google, Microsoft, Amazon and Meta do not have any other ways to continue showing growth, and when the market works that out, there will be hell to pay, hell that will reverberate through the valuations of, at the very least, every public software company, and many of the hardware ones too."
I am not doing some kind of sophisticated act of interpretation here. If AI is very little of big tech revenue, and big tech are posting massive record revenue and profits every quarter, then it cannot be the case that "there is nothing left after generative AI" and they “do not have any other ways to continue showing growth” — what is left is whatever is driving all that revenue and profit growth right now!
> $20/month ChatGPT Pro user: Heavy daily usage but token-limited
ChatGPT Pro is $200/month and Sam Altman already admitted that OpenAI is losing money from Pro subscriptions in January 2025:
"insane thing: we are currently losing money on openai pro subscriptions!
people use it much more than we expected."
- Sam Altman, January 6, 2025
> We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.
Source: https://www.axios.com/2025/08/15/sam-altman-gpt5-launch-chat...
His possible incentives and the fact OpenAI isn't a public company simply make it hard for us to gauge which of these statements is closer to the truth.
This sort of thing used to be called fraud, but there's zero chance of criminal prosecution.
Profitable on inference doesn't mean they aren't losing money on pro plans. What's not compatible?
The API requests are likely making more money.
The question is then whether SaaS companies paying for GPT API pricing are profitable if they charge their users a flat rate for a time period. If their users trigger inference too much, they would also lose money.
it is comical that something like this was even uttered in the conversation. It really shows how disconnected the tech sector is from the real world.
Imagine Intel CEO saying "If we didn't have to pay for fabs, we'd be a very profitable company." Even in passing. He'd be ridiculed.
As a counterpoint, if OpenAI were actually profitable at this early stage that could be a bad financial decision - it might mean that they aren't investing enough in what is an incredibly fierce and capital-intensive market.
Saying that is the equivalent of him saying "our product is really valuable! use it!"
> The most likely situation is a power law curve where the vast majority of users don't use it much at all and the top 10% of users account for 90% of the usage.
That'll be the Pro users. My wife uses her regular sub very lightly, most people will be like her...
I don’t buy the logic that he will “scam” his investors and run away at some point.
If OpenAI goes down tomorrow, he will be just fine. His incentive is to sell the stock, not actually build and run a profitable business.
Look at Adam Neumann as an example of how to lose billions of investor dollars and still walk out of the ensuing crash with over a billion.
https://en.wikipedia.org/wiki/Adam_Neumann
His strategy is to sell OpenAI stock like it was Bitcoin in 2020, and if for some reason the market decides that maybe a company that loses large amounts of cash isn't actually a good investment... he'll be fine, he's had plenty of time to turn some of his stock into money :)
Every o1-pro and o1-preview inference was a normal inference times how many replica paths they made.
1. Your token count per day seems quite low ("2M input tokens, ~30k output tokens/day") - that's FAR less than I'd expect,, for comparison I average 330M - 850M combined tokens per day, I'm on the higher side of my peers that average 150M-600M combined tokens per day.
2. It doesn't seem you're taking prompt caching into account. This generally reduces the inference required for agentic coding by 85-95%.
3. It would be good if you added what quantisation you're running, for example 8.5-9bpw / (Q8 equivalent) (indistinguishable from fp32/bf16) for the model, and for the KV cache (Q8/(b)f16 etc..).
Then, today almost every lab uses methods like speculative decoding and caching which reduce the cost and speed up things significantly.
The input numbers are far off. The assumption is 37B of active parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is about 2T params. Both of them (even if we assume MoE) wont have exactly these number of output params. Then there is a cost to hosting and activating params at inference time. (the article kind of assumes it would be the same constant 37B params).
For inference, gross margins are exactly: (what companies charge per 1M tokens to the user) - (direct cost to produce that 1M tokens which is GPU costs).
It could still be burning money for Microsoft/Amazon
[1]: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...
- The compute requirements would be massive compared to the rest of the industry
- Not a single large open source lab has trained anything over 32B dense in the recent past
- There is considerable crosstalk between researchers at large labs; notice how all of them seem to be going in similar directions all the time. If dense models of this size actually provided benefit compared to MoE, the info would've spread like wildfire.
> “If we didn’t pay for training, we’d be a very profitable company.”
There’s also a lot of comments in this thread who want LLM companies to fail for different reasons, so they’re projecting that wish on to imagined unit economics.
I’m having flashbacks to all of the conversations about Uber and claims that it was going to collapse as soon as the investment money ran out. Then Uber gradually transitioned to profitability and the critics moved to using the same shtick on AI companies.
Uber (and Lyft) didn't starve the alternatives: they were already severely malnourished. Also, they found a loophole to get around the medallion system in several cities, which taxi owners used in an incredibly anticompetitive fashion to prevent new competition.
Just because Uber used a shitty business practice to deliver the killing blow doesn't mean their competition were undeserving of the loss, or that the traditional taxis weren't without a lot of shady practices.
And lifetime profits for Uber are still at best break even which means that unless you timed the market perfectly, Uber probably lost you money as a shareholder.
Uber is just distorted in valuation by its presence in big US metro areas (which basically have no realistic transportation alternative).
Advertising is now a very very locked in market and will take over a decade to shift even a significant minority it into OpenAIs hands. This is not likely the first or even second monetization strategy imo.
But I’m happy to be wrong.
Can you elaborate? You’ve sparked my curiosity.
"No you're not, WE are digging a trench!"
Yes fine, but "I am as well".
Sheesh. Also I, personally, do and lead the work of taking the wallet share. So I will stick with "I" and would accept any of my team saying the same.
Sam Altman also said this:
He’s in the habit of lying, so it would be remiss to take his word for it.
Without having in depth knowledge of the industry, the margin difference between input and output tokens is very odd to me between your napkin math and the R1 prices. That's very important as any reasoning model explodes reasoning tokens, which means you'll encounter a lot more output tokens for fewer input tokens, and that's going to heavily cut into the high margin ("essentially free") input token cost profit.
Unless I'm reading the article wrong.
There were some oddities with the numbers themselves as well but I think it was all within rounding, though it would have been nice for the author to spell it out when he rounded some important numbers (~s don't tell me a whole lot).
TL;DR I totally agree, there are some napkin math issues going on here that make this pretty hard to see as a very useful stress test of cost.
Another question is - will it ever become less costly to train?
Let to see opinions from someone in the know
So to keep up with times the models have to be constantly trained.
One thing though is that right now it's not just incremental training, the whole thing gets updated - multiple parameters and how the model is trained is different.
This might not be the case in the future where the training could become more efficient and switch to incremental updates where you don't have to re-feed all the training data but only the new things.
I am simplifying here for brevity, but I think the gist is still there.
They're training new models because the (software) technology keeps improving, (proprietary) data sets keep improving (through a lot of manual labelling but also synthetic data generation), and in general researchers have better understanding of what's important when it comes to LLMs.
In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.
I'm also not sure we should expect their costs to just be "in-line with, or cheaper than" what various hourly H100 providers charge. Those providers presumably don't have to run entire datacenters filled to the gills with these specialized GPUs. It may be a lot more expensive to do that than to run a handful of them spread among the same datacenter with your other workloads.
But there is no way that OpenAI should be more expensive than this. The main cost is the capex of the H100s, and if you are buying 100k at a time you should be getting a significant discount off list price.
1. Idle instances don't turn electricity to heat so that reduces their operating cost.
2. Idle instances can be borrowed for training which means flexible training amortizes peak inference capacity.
They can repurpose those nodes for training when they aren't being used for inference. Or if they're using public cloud nodes, just turn them off.
This article is like saying an apartment complex isn’t “losing money” because the monthly rents cover operating costs but ignoring the cost of the building. Most real estate developments go bust because the developers can’t pay the mortgage payment, not because they’re negative on operating costs.
If the cash flow was truly healthy these companies wouldn’t need to raise money. If you have healthy positive cash flow you have much better mechanisms available to fund capital investment other than selling shares at increasingly inflated valuations. Eg issue a bond against that healthy cash flow.
Fact remains when all costs are considered these companies are losing money and so long as the lifespan of a model is limited it’s going to stay ugly. Using that apartment building analogy it’s like having to knock down and rebuild the building every 6 months to stay relevant, but saying all is well because the rents cover the cost of garbage collection and the water bill. That’s simply not a viable business model.
Update Edit: A lot of commentary below re the R&D and training costs and if it’s fair to exclude that on inference costs or “unit economics.” I’d simply say inference is just selling compute and that should be high margin, which the article concludes it is. The issue behind the growing concerns about a giant AI bubble is if that margin is sufficient to cover the costs of everything else. I’d also say that excluding the cost of the model from “unit economics” calculations doesn’t make business/math/economics since it’s literally the thing being sold. It’s not some bit of fungible equipment or long term capital expense when they become obsolete after a few months. Take away the model and you’re just selling compute so it’s really not a great metric to use to say these companies are OK.
You would need to figure out what exactly they are losing money on. Making money on inference is like operating profit - revenue less marginal costs. So the article is trying to answer if this operating profit is positive or negative. Not whether they are profitable as a whole.
If things like cost of maintaining data centres or electricity or bandwidth push them into the red, then yes, they are losing money on inference.
If the things that make them lose money is new R&D then that's different. You could split them up into a profitable inference company and a loss making startup. Except the startup isn't purely financed by VC etc, but also by a profitable inference company.
One thing that makes me suspect inference costs are coming down is how chatty the models have become lately, often appending encouragement to a checklist like "You can check off each item as you complete them!" Maybe I'm wrong, but I feel if inference was killing them, the responses would become more terse rather than more verbose.
The leaked OpenAI financial projections for 2024 showed about equal amount of money spent on training and inference.
Amortizing the training per-query really doesn't meaningfully change the unit economics.
> Fact remains when all costs are considered these companies are losing money and so long as the lifespan of a model is limited it’s going to stay ugly. Using that apartment building analogy it’s like having to knock down and rebuild the building every 6 months to stay relevant. That’s simply not a viable business model.
To the extent they're losing money, it's because they're giving free service with no monetizaton to a billion users. But since the unit costs are so low, monetizing those free users with ads will be very lucrative the moment they decide to do so.
The models as is are still hugely useful, even if no further training was done.
Exactly. The parent comment has an incorrect understanding of what unit economics means.
The cost of training is not a factor in the marginal cost of each inference or each new customer.
It’s unfortunate this comment thread is the highest upvoted right now when it’s based on a basic misunderstanding of unit economics.
And whether companies can survive in that scenario depends almost entirely on their unit economics of inference, ignoring current R&D costs
This talent diffusion guarantees that OpenAI and Anthropic will have to keep sinking in ever more money to stay at the bleeding edge, or upstarts like DeepSeek and incumbents like Meta will simply outspend you/hire away all the Tier 1 talent to upstage you.
The only companies that'll reliably print money off AI are TSMC and NVIDIA because they'll get paid either way. They're selling shovels and even if the gold rush ends up being a bust, they'll still do very well.
IF.
If you do stagnate for years someone will eventually decide to invest and beat you. Intel has proven so.
I don’t understand why people like you have to call this stuff out? Like most of HN thinks the way I do and that’s why the post was upvoted. Why be a contrarian? There’s really no point.
How can you possibly say this if you know anything about the evolution of costs in the past year?
Inference costs are going down constantly, and as models get better they make less mistakes which means less cycles = less inference to actually subsidize.
This is without even looking at potential fundamental improvements in LLMs and AI in general. And with all the trillions in funding going into this sector, you can't possibly think we're anywhere near the technological peak.
Speaking as a founder managing multiple companies: Claude Code's value is in the thousands per month /per person/ (with the proper training). This isn't a flash in the pan, this isn't even a "prediction" - the game HAS changed and anyone telling you it hasn't is trying to cover their head with highly volatile sand.
Unit economics is mostly a manufacturing concept and the only reason it looks OK here is because of not really factoring in the cost of building the thing into the cost of the thing.
Someone might say I don’t understand “unit economics” but I’d simply argue applying a unit economics argument saying it’s good without including the cost of model training is abusing the concept of unit economics in a way that’s not realistic from a business/economics sense.
The model is what’s being sold. You can’t just sell “inference” as a thing with no model. Thats just selling compute, which should be high margin. The article is simply affirming that by saying yes when you’re just selling compute in micro-chunks that’s a decent margin business which is a nice analysis but not surprising.
> That would be like saying the unit economics of selling software is good because the only cost is some bandwidth and credit card processing fees. You need to include the cost of making the software
Unit economics is about the incremental value and costs of each additional customer.
You do not amortize the cost of software into the unit economics calculations. You only include the incremental costs of additional customers.
> just like you need to include the cost of making the models.
The cost of making the models is important overall, but it’s not included in the unit economics or when calculating the cost of inference.
Compare the cost of tweeting to the cost of submitting a question to ChatGPT. The fact that ChatGPT rate limits (and now sells additional credits to keep using it after you hit the limit) indicates there are serious unit economic considerations.
We can't think of OpenAI/Anthropic as software businesses. At least from a financial perspective, it's more similar to a company selling compute (e.g. AWS) than a company selling software (e.g. Twitter/X).
2. “Open source” is great but then it’s just a commodity. It would be very hard to build a sustainable business purely on the back of commoditized models. Adding a feature to an actual product that does something else though? Sure.
The title of the article directly says “on inference”. It’s not a mistake to exclude training costs. This is about incremental costs of inference.
You don’t include fixed costs in the unit economics. Unit economics is about incremental costs.
This question came up and Sam said they were profitable if you exclude training and the COO corrected him
So at least for OpenAI, the answer is “no”
They did say it was close
And that’s if you exclude training costs which is kind of absurd because it’s not like you can stop training
It’s therefore interesting that they claimed it was close: this supports the theory inferencing from paid users is a (big) money maker if it’s close to covering all the free usage and their payroll costs?
They quote him as saying inference is profitable and end it at that.
Are you saying that the COO corrected him at the dinner, or on the podcast? Which podcast was it?
“I think that tends to end poorly because as demand for your service grows, you lose more and more money. Sam Altman actually addressed this at dinner. He was asked basically, are you guys losing money every time someone uses ChatGPT?
And it was funny. At first, he answered, no, we would be profitable if not for training new models. Essentially, if you take away all the stuff, all the money we're spending on building new models and just look at the cost of serving the existing models, we are sort of profitable on that basis.
And then he looked at Brad Lightcap, who is the COO, and he sort of said, right? And Brad kind of like squirmed in his seat a little bit and was like, well, we're pretty close.
We're pretty close. We're pretty close.
So to me, that suggests that there is still some, maybe small negative unit economics on the usage of ChatGPT. Now, I don't know whether that's true for other AI companies, but I think at some point, you do have to fix that because as we've seen for companies like Uber, like MoviePass, like all these other sort of classic examples of companies that were artificially subsidizing the cost of the thing that they were providing to consumers, that is not a recipe for long-term success.”
From Hard Fork: Is This an A.I. Bubble? + Meta’s Missing Morals + TikTok Shock Slop, Aug 22, 2025
Uber doesn't really compare, as they had existing competition from taxi companies that they first had to/have to destroy. And cars or fuel didn't get 10x cheaper over the time of Uber's existence, but I'm sure that they still can optimize a lot for efficiency.
I'm more worried about OpenAIs capability to build a good moat. Right now it seems that each success is replicated by the competing companies quickly. Each month there is a new leader in the benchmarks. Maybe the moat will be the data in the end, i.e. there is barriers nowadays to crawl many websites that have lots of text. Meanwhile they might make agreements with the established AI players, maybe some of those agreements will be exclusive. Not just for training but also for updating wrt world news.
Hoping for something net profitable including fixed costs from day 1 is a nice fantasy, but that’s not how any business works or even how consumers think about debt. Restaurants get SBA financing. Homeowners are “net losing money” for 30 years if you include their debt, but they rightly understand that you need to pay a large fixed cost to get positive cash flow.
R&D is conceptually very similar. Customer acquisition also behaves that way
https://x.com/FinHubIQ/status/1960540489876410404
the short of it: if you do the accounting on a per-model basis, it looks much better
And it's a relevant question because people constantly say these companies are losing money on inference.
Spending hundreds of millions of dollars on training when you are two guys in a garage is quite significant, but the same amount is absolutely trivial if you are planet-scale.
The big question is: how will training cost develop? Best-case scenario is a one-and-done run. But we're now seeing an arms race between the various AI providers: worst-case scenario, can the market survive an exponential increase in training costs for sublinear improvements?
Why do you think they will mindlessly train extremely complicated models if the numbers don’t make sense?
Nobody is going to pay the same price for a significantly worse model. If your competitor brings out a better model at the same price point, you either a) drop your price to attract a new low-budget market, b) train a better model to retain the same high-budget market, or c) lose all your customers.
You have taken on a huge amount of VC money, and those investors aren't going to accept options A or C. What is left is option B: burn more money, build an even better model, and hope your finances last longer than the competition.
It's the classic VC-backed startup model: operate at a loss until you have killed the competition, then slowly increase prices as your customers are unable to switch to an alternative. It worked great for Uber & friends.
To me that more or less settles both "which one is best" and "is it subsidized".
Can't be sure, but anything else defies economic gravity.
Also that's not accounting for free riders.
I have probably consumed trillions of free tokens from openai infra since gpt 3 and never spent a penny.
And now I'm doing the equivalent on Gemini since flash is free of charge and a better model than most free of charge models.
You're arguing that maybe the big companies won't recoup their investment in the models, or profitably train new ones.
But that's a separate question. Whether a model - which now exists! - can profitably be run is very good to know. The fact that people happily pay more than the inference costs means what we have now is sustainable. Maybe Anthropic of OpenAI will go out of business or something, but the weights have been calculated already, so someone will be able to offer that service going forward.
The economics are awful and local model performance is pretty lackluster by comparison. Never mind much slower and narrower context length.
$6,000 is 2.5 years of a $200/mo subscription. And in 2.5 years that $6k setup will likely be equivalent to a $1k setup of the time.
The $20 subscription is far more capable than anything i could build locally for under $10k.
See the recent reactions to AWS pricing on Kiro where folks had a big WTF reaction on pricing after, it appears, AWS tried to charge realistic pricing based on what this stuff actually costs.
If you’re applying the same pricing structure to Kiro as to all AWS products then, yeah, it’s not particularly hobbyist accessible?
If this were true, the stock market would have no reason to exist.
Exactly the analogy I was going to make. :)
Is that actually true in 2025? Presumably you have to make coupon payments on a bond(?), but shares are free. Companies like Meta have shown you can issue shares that don't come with voting rights and people will buy them, and meme stocks like GME have demonstrated the effectiveness of churning out as many shares as the market will bear.
These companies are behaving the same way. Folks are willing to throw endless money into the present pit so on the one hand I can’t blame them for taking it.
Reality is though that when the hype wears off it’s only throwing more gasoline on the fire and building a bigger pool of investors that’s will become increasingly desperate to salvage returns. History says time and time again that story doesn’t end well and that’s why the voices mumbling “bubble” under their breath are getting louder every day.
Think of the model as an investment.
Exactly, or a factory.
I know there is lots of bearish sentiments here. Lots of people correctly point out that this is not the same math as FAANG products - then they make the jump that it must be bad.
But - my guess is these companies end up with margins better than Tesla (modern manufacturer), but less than 80%-90% of "pure" software. Somewhere in the middle, which is still pretty good.
Also - once the Nvidia monopoly gets broken, the initial build out becomes a lot cheaper as well.
Expect the trend to pick up as the pool of engineers who can create usable LLMs from scratch increases through knowledge/talent diffusion.
If OpenAI didn't come along with ChatGPT, we would probably just now be getting Google Bard 1.0 with an ability level of GPT-3.5 and censorship so heavy it would make it useless for anything beyond "Tell me who the first president was".
So the question remains unanswered, at least for us. For those putting money in, you can be absolutely certain they have a model with sufficient data to answer the question. Since money did go in, even if it's venture, the answer is probably "yes in the immediate, but no over time."
1.44e6 tokens/sec * 37e9 bytes/token / 3.3e12 bytes/sec/GPU = ~16,000 GPUs
And that's assuming a more likely 1 byte per parameter.
So the article is only off by a factor of at least 1,000. I didn't check any of the rest of the math, but that probably has some impact on their conclusions...
Edit: Oh assuming this is an estimate based on the model weights moving fromm HBM to SRAM, that's not how transformers are applied to input tokens. You only have to do move the weights for every token during generation, not during "prefill". (And actually during generation you can use speculative decoding to do better than this roofline anyways).
And more importantly batches, so taking the example from the blog post, it would be 32 tokens per each forward pass in the decoding phase.
This doesn't quite sound right...isn't a token just a few characters?
You are doing the calculation as they were output tokens on a single batch, it would not make sense even in the decode phase.
If you actually want to know, I recommend Inference economics of language models from Epoch AI, which is probably the best public model as of 2025-06.
On the other side, there's an insane booster of speculative decoding, that would give a semi-prefill rate for decoding, but the memory pressure is still a factor.
I would be happy to be corrected regarding both factors.
What reasoning affects is the ratio of input to output tokens, and since input tokens are cheaper, that may well affect the economics in the end.
Back in March, I did the same analysis with greater sensitivities, and arrived at similar gross margins: >70%.
https://johnnyclee.com/i/are-frontier-labs-making-80percent-...
There are also probably all kinds of enterprise deals that they are okay with high latency (> hours) that they do beyond the PAYG batch APIs
Given the analysis is based on R1, Deepseek's actual in-production numbers seem highly relevant: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
(But yes, they claim 80% margins on the compute in that article.)
> When established players emphasize massive costs and technical complexity, it discourages competition and investment in alternatives
But it's not the established players emphasizing the costs! They're typically saying that inference is profitable. Instead the false claims about high costs and unprofitability are part of the anti-AI crowd's standard talking points.
I think the cache hit vs miss stuff makes sense at >100k tokens where you start getting compute bound.
> Each H800 node delivers an average throughput of ~73.7k tokens/s input (including cache hits) during prefilling or ~14.8k tokens/s output during decoding.
That's a 5x difference, not 1000x. It also lines up with their pricing, as one would expect.
(The decode throughputs they give are roughly equal to yours, but you're claiming a prefill performance 200x times higher than they can achieve.)
The only people who thought this were non-practitioners.
Not saying there's not interesting analysis here, but this is assuming that they don't have to pay for access to the massive amounts of context. Sources like stackoverflow and reddit that used to be free, are not going to be available to keep the model up to date.
If this analysis is meant to say "they're not going to turn the lights out because of the costs of running", that may be so, but if they cannot afford to keep training new models every so often they will become less relevant over timte, and I don't know if they will get an ocean of VC money to do it all again (at higher cost than last time, because the sources want their cut now).
It seems sort of like wondering if a fiber ISP is profitable per GB bandwidth. Of course it is; the expensive part is getting the fiber to all the homes. So the operations must be profitable or there is simply no business model possible.
1) routing: traffic can be routed to smaller, specialized, or quantized models
2) GPU throughput vs latency: both parameters can be tuned and adjusted based on demand. What seems like lots of deep "thinking" might just be trickling the inference over less GPU resources for longer.
3) caching
This sounds incorrect, you only process all tokens once, and later incrementally. It's an auto-regressive model after all.
From that point on every subsequent tokens is processed sequentially in autoregressive way, but because we have the KV cache, this becomes O(N) (1 token query to all tokens) and not O(N^2)
I guess my use is absolutely nothing compare to someone with a couple of agents running continuously.
The largest context window a model can offer at a given quality level depends on the context size the model was pretrained with as well as specific fine tuning techniques.
It’s not simply a matter of considering increased costs.
They have a service which understands a users question/needs 100x better than a traditional Google search does.
Once they tap into that for PPC/paid ads, their profit/query should jump into the green. In fact, there's a decent chance a lot of these models will go 100% free once that PPC pipeline is implemented and shown to be profitable.
If they start showing ads based on your prompts, and your history of "chats", it will erode the already shaky trust that users have in the bots. "Hallucinations" are one thing, but now you'll be asking yourself all the time: is that the best answer the llm can give me, or has it been trained to respond in ways favourable to its advertisers?
Google used to segregate ads very clearly in the beginning. Now they look almost the same as results. I've switched to DDG since then, but have the majority of users? Nope. Even if they're not using ad blockers, most people seem to not mind the ads.
With LLMs, the ads will be even more harder to tell apart from non-ads.
Source?
It’s not like the product at-hand is relevant to data analysis or anything, amirite?
Gemini doesn’t always find very much better results, but it usually does. It beggars belief to claim that it doesn’t also understand the query much better than Rankbrain et al.
1. Companies that train models and license them
2. Companies that do inference on models
I think most folks understand that pure inference in a vacuum is likely cash flow positive, but that’s not why folks are asking increasingly tough questions on the financial health of these enterprises.
If they weren’t losing money, they wouldn’t be spending enough on R&D. This isn’t some gotcha. It’s what the investors want right now.
WeWork’s investors didn’t want them to focus on business fundamentals either and kept pumping money elsewhere. That didn’t turn out so well.
OpenAI projects 50% gross margins for 2025
The other companies don't include free users in their GM calculations which makes it hard to compare
If they insert stealth ads, then after the third sponsored bad restaurant suggestion people will stop using that feature, too.
Hell they could even just add affiliate tracking to links (and not change any of the ranking based on it) and probably make enough money to cover a lot of the inference for free users.
Or are these costs just insignificant compared to inference?
That's why Microsoft is not doing the deal with OpenAI, that's why Claude was fiddling with token limits just a couple of weeks ago.
It's a huge bubble, and the only winner at this moment is Nvidia.
So there are two answers: for the model providers, it's because they're spending it all on training the next model. For the API users, it's because they're spending it all on expensive API usage.
The cheap usecase from this article is not a trillion dollar industry and absolutely not the usecase hyped as the future by AI companies, that is coming for your job.