It doesn't stop there though. OpenAI is currently mired in a capital crunch. Their last round just about sucked all the dry powder out of the private markets. Folks are now starting to ask difficult questions about their burn rate and revenue. It is increasingly looking like they might not commit to the purchase order they made which kick-started this whole panic over RAM.
Soo ... how sure are we that the memory makers themselves are not going to be the ones holding the bag?
If they could make this stuff and sell it to regular people a decade ago for very palatable prices, why do they come up with the idea that this is the technology of the gods, unaffordable by mere mortals?
Heck, I have a phone with a 16bit memory bus for instance. The high(ish) clock rate only makes up the difference slightly.
But with general prices on all components going up, it might not be such a big factor any more.
HBM migght make sense for higher end products which can free up space for the lower end that will never use the tech.
Designing a part with a wide bus and putting the traces down on the board is what I would expect to be the easy part these days (surely).
But yield, yield comes for us all.
because the gods want it all and are willing to pay top dollar.
I wonder whether this is some kind of a racket.
No.
The GB202 die that's in the GDDR7 based RTX 5090 and RTX 6000 Pro literally needed to be this big to support the 512bit memory bus. It's probably only getting worse with smaller node sizes. (see https://www.youtube.com/watch?v=rCwgAGG2sZQ&t=65s).
BTW: The 1TB/s is matched by RTX4090 and surpassed by the RTX5090 (1,79 TB/s).
I'd absolutely buy another hbm consumer GPU if it had at least 8gb (and if I got the vibe/hope AMD will actually support for a couple years...)
5090 has 1.8 TB/s?
Even a RTX 5080 has a lower memory throughput than a Radeon VII from 2019, 7 years ago, while being much more expensive.
The memory throughput of GPUs per dollar has regressed greatly during the last 5 years, despite the fact that the widths of the GPU memory interfaces have been reduced, in order to decrease the production costs.
RTX 5080 has a 256-bit memory interface, while the much cheaper Radeon VII had an 1024-bit memory interface. RTX 5080 has almost 4-times faster memories than Radeon VII, but it has not used this to increase the memory throughput, but only to reduce the production costs, while simultaneously increasing the product price.
And it's faster for gaming, I guess? Which is what matters for the typical user.
Anyway you can buy much faster GPUs now than in 2019. They are also much more expensive, yes.
I suppose that most games are limited by computation, so they are indeed much faster on modern GPUs.
However, there are applications that are limited by memory throughput, not by computation, including AI inference and many scientific/technical computing applications.
For such applications, old GPUs with higher memory throughput are still faster.
This is why I am still using an old Radeon VII and a couple of other ancient AMD GPUs with high memory throughput.
Last year I have bought an Intel GPU, which is still slower than my old GPUs, but it at least had very good performance per dollar, competitive with that of the old GPUs, because it was very cheap, while the current AMD and especially NVIDIA GPUs have poor performance per dollar.
5090s are certainly expensive compared to most other GPUs, but not expensive enough to be unobtanium for nearly any professional who could utilize one as part of their job
That's correct if you're targeting gamers, but local AI inference changes this picture substantially.
R9700 has 32GB and is cheaper than most NVidia consumer GPUs, even though it's a "pro".
AMD Hawaii GPUs still had 1:2 FP64:FP32, while the consumer variant of Radeon VII dropped to 1:4. The following AMD consumer GPUs dropped the FP64 performance to levels that are not competitive with CPUs.
Nowadays the only consumer GPUs with decent FP64 performance are the Intel Battlemage GPUs, which have a 1:8 performance ratio, which provides very good performance per dollar.
That is, memory capacity is reserved for datacenters yet to be built, but this will do weird things if said datacenter construction is postponed or cancelled altogether.
It says that in 2025, Netherlands was a net exporter of electricity (~14,000 GWh). My guess: Where they want to build data centers, the grid cannot handle it, but the overall system has more than enough power to build data centers. Do you think that sounds like a resonable guess?
Are the Netherlands a large proportion of global datacenters?
The value of an IX isn't just in the IX itself, but also in the presence of hundreds of parties for direct peering, and excellent connectivity to the rest of the world.
It makes a lot of sense to build your DC near one - even if you have no intention of actually participating in the IX itself.
They don't need entire IX worth of connectivity. You're sending mostly text back and forth and any media is in far lower volume than even normal far less dense DC would generate, all the major traffic is inside the AI DC.
All it needs is fiber to nearest IX
In most other places the percentage is significantly less than that and then you can easily add more of the cheap-but-intermittent stuff because a cloudy day only requires you to make up a 10% shortfall instead of a 50% one, which existing hydro or natural gas plants can handle without new storage when there are more of them to begin with.
> The Dutch power grid is already almost 50% renewables
I was a bit stunned when I read this. Your estimate is very close for 2025 here: https://en.wikipedia.org/wiki/Electricity_sector_in_the_Neth...I calculate about 43.5% was solar or wind. What is way crazier is the "bend in the curve" of production sources in the last 10 years. Look here at how fast solar and wind is growing! https://en.wikipedia.org/wiki/File:Netherlands_electricity_g...
What's more common is that they don't have the transmission capacity itself, but that one's pretty easy in this case too, because what that means is that you have an existing transmission line which is already near capacity with generation on one end and customers on the other. So then you just build the data center on the end of the transmission line where the generation is rather than the end where the existing customers are, at which point you can add new generation anywhere you want -- and if you put it near the existing customers you've just freed up transmission capacity because you now have new customers closer to the existing generation and new generation closer to the existing customers.
That's where AWS us-east-1 is, i.e. the oldest AWS region where they got started to begin with. Google and Microsoft also have a large presence there. It's not just the US government, it's everybody, and it's not new.
> How did they do it?
Here's the US nuclear plant map, guess where a bunch of them are:
https://www.eia.gov/todayinenergy/detail.php?id=65104
The area around Virginia is also a major coal producer and when this was getting started it was a source of cheap electricity, but coal is quickly being replaced with natural gas via pipelines from the Gulf coast. Their current power mix is ~30% nuclear, ~12% renewables (solar) and almost all the rest natural gas.
That's because you don't live in Maryland.
Our energy bills are through the roof and our transmission company is talking about rolling blackouts in 2027.
https://www.thebanner.com/community/climate-environment/cont...
EDIT
The opening paragraph:
> State regulators’ review of the controversial power line proposed to stretch across three rural Maryland counties will extend to at least February 2027, officials announced Thursday, a timeline that prevents developers from meeting the grid operator’s deadline to ensure reliable electricity.
I bet this is pure NIMBYism. Just this phrase alone is a dead giveaway: "controversial power line". LOL: What is controversial about a power line? Hint: They aren't, but NIMBYism exists.If demand increases you have to build more generation and power lines etc. This is not a problem but for NIMBYs, it's just the logical consequence. If the local population increases and you don't have enough grocery stores you don't say "grocery stores are stressed" and regard it as an insurmountable problem, people just open more of them.
In every country? Citation needed.
Had we done more 10 years ago we would have been better of. The second best time to start is now.
(We used to build it at a fraction of the cost and less than half of the time that we do with our modern fuckups and fuel can come from just about anywhere if need be. It might be a lot more expensive than the stuff kazachstan and still be a fraction of the cost.)
I think ideally we would've done both to press the cost of nuclear down and given the fact that the renewables rollout turned out to be a lot lot more expensive than proponents claimed it would be whilst still tying us up into gass to cover winter.
Then why all the anti-coal mining diktats coming down from Brussels?
Renewables deployment is happening fast. Grid upgrades are not. Batteries .. it depends.
Even nuclear darling France has set solar records: https://www.pv-magazine.com/2026/04/15/france-germany-set-da...
Brussels is trying to reduce "tiny" to zero, because of this: https://en.wikipedia.org/wiki/Tragedy_of_the_commons
China, like Brussels, is trying to reduce coal for similar reasons. They don't like the air pollution health hazard (fully believable), and they say they don't like global warming (somewhat believable).
The problem in the EU is not renewables, it's the same problem that Democratic states in the US face. Regulations and permitting hurdles that block private renewable energy developers.
(Well that and collusion)
If they had actually been communicating or colluding with each other, they would have put the screws to him, making it harder for OpenAI to assert control over the vast majority of the DRAM market.
Failing that, you'd like to think a regulatory agency somewhere would step in to keep a single player from hosing everybody else, but...
Up until AI there weren't really players being able to gobble 40% of the market so nobody was looking.
I don’t buy it that two of the largest manufacturers of DRAM in the world, from the same country, didn’t know this. Even of you ignore each company’s intelligence teams, that’s also the job of the country’s internal intelligence services, to make sure they know what all companies are doing and then make it so they have the best leverage to gain as much as possible. Both companies would have known “somehow” and played hardball.
By spying?
The companies also do a lot of spying themselves, every bit of info could give them an edge.
You get market signals that the demand is there, you acquire the necessary capital, you spend 5 years to build capacity, but guess what, 5 other market players did the same thing. So now you are doomed, because the market is flooded and you have low cash flow since you need to drop prices to compete for pennies.
Now you cannot find capital, you don't invest, but guess what, neither your competitors did. So now the demand is higher than the supply. Your price per unit sold skyrocketed, but you don't have enough capacity!
Rinse and repeat.
Capitalists claim that this is optimal.
If anything, it shows it's possible for you to arbitrage this and in doing so help "smooth out the cycle."
At least with capitalism you have many different people with different perspectives on the risk making independent bets. That mitigates the more extreme negative outcomes.
We don't even expect companies to plan long-term anymore, it's just moving wealth as fast as possible.
That isn't really a change, very few people could ever have been said to be ideological capitalists. (capitalist is not a word with a hard definition, but I'm considering it a different thing than the more modern pure libertarian zero-regulation ideology)
This is distinct from someone who is a proponent of capitalism as a system, which appears to be the way you are using capitalist. For which I don't blame you.
Because that does not happen exactly as you say for all players. The demand signals will be processed and long-term risk is balanced against short-term gain in a distributed fashion, so not everyone will do the same.
It's more optimal than planned economies until we have AI planned economies with realtime feedback, I guess.
Consumers get cheap goods during oversupply and most inefficient companies get elliminated during bust while consolidation leads to economies of scale.
There is an alternative where legislation dampens this behavior but the short term profits will be lower. Hence the hawks don’t like it.
Potentially. Well meaning and thought out legislation still distorts the markets, possibly making things objectively worse.
It is unlike a socialist system in that there are signals to read in the first place. What, do socialists claim that failure is optimal when you can't even tell if you've failed?
That's not what capitalists claim. Capitalists claim communism is responsible for tens, if not hundreds, of millions of death due to famine. And overall miserable ways of life, as if humans were termites deprived of any individualism and any freedom.
Capitalists claim that France importing 500 000 people from the third world each year, out of which only 10% are ever going to work (and these are official numbers) and yet offering them all a safety net is unsustainable. And that that socialism is only going to lead to one thing: running out of taxpayers' money.
Capitalists don't claim they have the best system: what they claim is that they haven't seen a less worse one.
> Capitalists claim that this is optimal.
Compared to starving under communism coz someone at top got the number wrong, yes. And it only really happens when there are massive, unpredictable market movements and governments not doing their job. Govt should look at the whole thing and just say "no", blame them.
No market system self regulates well enough, and it's government job to file down the edge cases like this. But the revolution happened in country which has two utterly incompetent parties, both in pockets of billionaires, fighting for power, and the clowns from one that won last battle use AI to smokescreen the economic growth their actions cratered
The memory makers specifically did not scale up capacity to avoid being left holding the bag.
Do recent actions of Open AI give you the impression of a company that believes it is about to attain AGI imminently?
Hell, all it matters to investors is not being left holding the bag in the end so they don't even need to believe it
We've been projecting both FTL and AGI as future possibilities for almost 100 years now. Do LLMs get us a lot closer to AGI? I think they get us a little closer and Moore's "law" making compute faster probably is a much bigger factor, but I think we're still a very very long ways away.
I think Douglas Hofstadter satisfactorily answered this question.
> We can't even answer if we have free will or not.
Sure we can, it's just that most people don't like the answer.
He didn't prove anything. It's a theory, just like many others: https://plato.stanford.edu/entries/consciousness-higher/
> Sure we can, it's just that most people don't like the answer.
Again, not proven
I am not sure if OpenAI has that. Their edge regarding models is small, their strategy currently seems to be "buy ALL the hardware so nobody else can". Users can quite easily switch to other models.
The real issue is everyone wanting to upgrade to hbm, ddr5, and nvme5 at the same time.
The Fiji XT architecture after it had 512GB/S on a 4096b HBM bus in 2015.
The Vega architecture did have 400GB/s or so in 2017, which was a bit of a downgrade.
At least as I understand it.
Very few applications other than GPUs need HBM.
It's worse. HBM have lower yields so they are essentially making less GB per wafer too
The specific mix of factors could change at any time, but the supply chain is relatively inelastic, it will take some time to show up on price labels.
this view isn't updated correctly post-claude code and codex. there will clearly be sufficient demand.
OpenAI (or whoever) crashes and can't pay for the order leaving the memory makers in a tough spot.
Oh noes! Think of a poor memory makers!
The amount of money flowing both from the AI bubble and from quite literally scalping both the server and consumer market... They gambled on the opportunity and if they fail - it's their problem.
Capitalists did their gamble things. If they fail in that gamble what forbids them to sell the regular RAM they made for AI bubbleists to the regular consumers? Besides HBM it's just the regular chips which are exactly the same for the consumer/server market, why it would be any different?
Edit: also, that demand pressure is going to be applied constantly; there isn’t going to be a shock, it’s just going to keep prices high longer.
Wasn't the problem here that OpenAI was negotiating with Samsung and SK Hynix at the same time without the other one knowing about it? People only realized the implications when they announced both deals at once.
I hope they do, they did not have to agree to sell so much RAM to one customer. They’ve been caught colluding and price fixing more than once, I hope they take it in the shorts and new competitors arise or they go bankrupt and new management takes over the existing plants.
Don’t put all your eggs in the one basket is how the old saying goes.
We aren't. The remaining memory manufacturers fear getting caught in a "pork cycle" yet again - that is why there's only the three large ones left anyway.
China has memory makers who are creeping up through the stages of production maturity, and once they hit then there's no going back.
If the existing makers can't meet supply such that Chinese exports get their foot in the door, they may find they never get ahead again due to volume - that domestic market is huge so they have scale, and the gaming market isn't going to care because they get anything at the moment, which is all you'll need for enterprise to say "are we really afraid of memory in this business?"
The answer to that is government regulation. Ban anything Chinese or slap it with tariffs. That is what tariffs are intended for - not for the BS the current administration has done.
Are they really such a big RAM buyer?
> OpenAI’s rapid growth, fueled by the success of ChatGPT and other AI products, led to a landmark agreement in October to purchase 900,000 DRAM wafers per month from Samsung and SK Hynix—amounting to roughly 40% of global supply. This surge in demand, coupled with limited manufacturing capacity, sent prices for memory kits skyrocketing. [0]
[0]: https://peq42.com/blog/openai-canceling-many-large-purchase-...
If I booked half a hotel's rooms then suddenly said "yeah never mind. Half my friends cancelled and we're not staying", basically any hotel would be coming at me for my money because there's no way they can fill their rooms now and they're losing revenue. But OpenAI can really get the whole world to pivot towards it then say "cool but we don't need your product anymore" and RAM makers are just going to let it go.
Whoever decided that was a good idea needs to be fired and publicly shamed.
if anything, OpenAi might be in on it
The customer ran out of money. In terms of where you are in line of debtors when you haven't even delivered the product to a customer, it's so far back as to be assured you won't get your money.
If the memory makers got a deposit from OpenAI as part of this deal, that is likely to be the only money they will get for any undelivered memory, particularly if OpenAI runs out of capital.
Which also explains why production is falling behind demand, companies aren't going to sink billions into creating product for a market that could dry up overnight.
Oh no!
If they add enough capacity to meet current demand quickly then if demand crashes they still have billions of dollars in loans used to build capacity for demand that no longer exists and then they go bankrupt.
The biggest problem is predicting future demand, because it often declines quickly rather than gradually.
If you suppose you have cracked the smooth-ramping problem, perhaps you should throw your hat in the ring and soak up all the pent-up demand that SK Hynix, Samsung and Micron are neglecting.
If he can do all that that fast, the RAM makers should be able to at least 1000X their fab capacity on earth in one year. One year for scaling up existing tech is an eternity compared to Elon's timeframe for moon-fabs given the relative complexity of the challenge.
They act as a de-facto monopoly and milk us. Why is this allowed?
Nobody is "allowing" this. It's a natural property of being both advanced technology and a commodity at the same time.
Recently they had a second price fixing lawsuit thrown out (in the US).
Now with the state of things I'm sure another lawsuit will arrive and be thrown out because the government will do anything to keep the AI bubble rolling and a price fixing suit will be a threat to national security, somehow. Obviously thats speculative and opinion but to be clear, people are allowing it. There are and more so were things that could be done.
It started with raegan, and even parties on the “left” in the west believe in it with very few exceptions.
it was clinton who delivered the democrats to wall street and vice versa
Maybe it’s time for a refresher on what neoliberal means? It’s not simply “new liberalism”. Reaganomics was the start of neoliberalism in the US, tho of course it shifted and developed its character further over time into the monster that drives 99% of our problems today
The thing that enables this is pretty obvious. The population is divided into two camps, the first of which holds the heuristic that regulations are "communism and totalitarianism" and this camp is used to prevent e.g. antitrust rules/enforcement. The second camp holds the heuristic that companies need to be aggressively "regulated" and this camp is used to create/sustain rules making it harder to enter the market.
The problem is that ordinary people don't have the resources to dive into the details of any given proposal but the companies do. So what we need is a simple heuristic for ordinary people to distinguish them: Make the majority of "regulations" apply only to companies with more than 20% market share. No one is allowed to dump industrial waste in the river but only dominant companies have bureaucratic reporting requirements etc. Allow private lawsuits against dominant companies for certain offenses but only government-initiated prosecutions against smaller ones, the latter preventing incumbents from miring new challengers in litigation and requiring proof beyond a reasonable doubt.
This even makes logical sense, because most of the rules are attempts to mitigate an uncompetitive market, so applying them to new entrants or markets with >5 competitors is more likely to be deleterious, i.e. drive further consolidation. Whereas if the market is already consolidated then the thicket of rules constrains the incumbents from abusing their dominance in the uncompetitive market while encouraging new entrants who are below the threshold.
As a counterpoint: Look at very high value goods, like jet engines and MRI machines. I went for an MRI the other day and wondered to myself (then asked an LLM) what the international MRI market looks like. They are vanishingly small number of manufacturers and are usually dominated by a few international players. How are you going to apply this tax to non-domiciled (international) companies? Also, companies like General Electric, Mitsubishi Heavy, and Seimens are enormous and incredibly diverse. This idea falls apart quickly.
How is this more efficient? You'd still be applying all of the inefficient regulatory rules intended to mitigate a lack of competition to the smaller companies trying to sustain a competitive market, and those rules are much more deleterious for smaller entities than higher tax rates.
If you have $100M in fixed regulatory overhead for a larger company with $10B in profit, it's only equivalent to a 1% tax. The same $100M for a smaller company with $50M in profit is a 200% tax. There is no tax rate you can impose on the larger company to make up for it because the overhead destroys the smaller company regardless of what you do to the larger one.
There’s virtually infinite capital: if needed, more can be reallocated from the federal government (funded with debt), from public companies (funded with people’s retirement funds), from people’s pockets via wealth redistribution upwards, from offshore investment.
They will be allowed to strangle any part of the supply chain they want.
Another point is I often see the money argument - like country X has more money, so they can afford to do more and better R&D, make more stuff.
This stuff comes out of factories, that need to be built, the machinery procured, engineers trained and hired.
[1]https://www.tomshardware.com/tech-industry/semiconductors/ym...
> more can be reallocated from the federal government (funded with debt)
While this is the most reliable funding, it's still not very accessible. OpenAI is a money pit, and their demands are growing quickly. The US government has started a bunch of very expensive spending. If OpenAI were to require yearly bundles of it's recent "$120B" deal, that's 6% of the US' discretionary budget. 12.5% of the non-military discretionary budget. (And the military is going to ask for a lot more money this year) Even the idea of just issuing more debt is dubious because they're going to want to do that to pay for the wars that are rapidly spiralling out of control.
None of this is saying that the US government can't or wouldn't pay for it, but it's non trivial and it's unclear how much Altman can threaten the US government "give me a trillion dollars or the economy explodes" without consequences.
Further deficit-spending isn't without it's risks for the US government either. Interests rates are already creeping up, and a careless explosion of deficit may well trigger a debt crisis.
> from public companies (funded with people’s retirement funds)
This would be at great cost. OpenAI would need to open up about it's financial performance to go public itself. With it's CFO being put on what is effectively Administrative Leave for pushing against going public, we can assume the financials are so catastrophic an IPO might bomb and take the company down with it. Nobody's going to be investing privately in a company that has no public takers.
Getting money through other companies is also running into limits. Big Tech has deep pockets but they've already started slowing down, switching to debt to finance AI investment, and similarly are increasingly pressured by their own shareholders to show results.
> from people’s pockets via wealth redistribution upwards
The practical mechanism of this is "AI companies raise their prices". That might also just crash the bubble if demand evaporates. For all the hype, the productivity benefit hasn't really shown up in economy-wide aggregates. The moment AI becomes "expensive", all the casual users will drop it. And the non-casual users are likely to follow. The idea of "AI tokens" as a job perk is cute, but exceedingly few are going to accept lower salary in order to use AI at their job.
There's simply not much money to take out of people's pockets these days, with how high cost of living has gotten.
> from offshore investment.
This is a pretty good source of money. The wealthy Arabian oil states have very deep slush funds, extensively investing in AI to get ties to US businesses and in the hope of diversifying their resource economies.
...
...
"Was". Was a good source of money.
Just look at Cuba, which could be a very rich country and one of the prime tourist destinations of the world.
Given that TurboQuant results in a 6x reduction in memory usage for KV caches and up to 8x boost in speed, this optimization is already showing up in llama.cpp, enabling significantly bigger contexts without having to run a smaller model to fit it all in memory.
Some people thought it might significantly improve the RAM situation, though I remain a bit skeptical - the demand is probably still larger than the reduction turboquant brings.
> Given that TurboQuant results in a 6x reduction in memory usage for KV caches
All depends on baseline. The "6x" is by stylistic comparison to a BF16 KV cache; not a state of the art 8 or 4 bit KV cache scheme.
Current "TurboQuant" implementations are about 3.8X-4.9X on compression (w/ the higher end taking some significant hits of GSM8K performance) and with about 80-100% baseline speed (no improvement, regression): https://github.com/vllm-project/vllm/pull/38479
For those not paying attention, it's probably worth sending this and ongoing discussion for vLLM https://github.com/vllm-project/vllm/issues/38171 and llama.cpp through your summarizer of choice - TurboQuant is fine, but not a magic bullet. Personally, I've been experimenting with DMS and I think it has a lot more promise and can be stacked with various quantization schemes.
The biggest savings in kvcache though is in improved model architecture. Gemma 4's SWA/global hybrid saves up to 10X kvcache, MLA/DSA (the latter that helps solve global attention compute) does as well, and using linear, SSM layers saves even more.
None of these reduce memory demand (Jevon's paradox, etc), though. Looking at my coding tools, I'm using about 10-15B cached tokens/mo currently (was 5-8B a couple months ago) and while I think I'm probably above average on the curve, I don't consider myself doing anything especially crazy and this year, between mainstream developers, and more and more agents, I don't think there's really any limit to the number of tokens that people will want to consume.
For example Gemma 4 32B, which you can run on an off-the-shelf laptop, is around the same or even higher intelligence level as the SOTA models from 2 years ago (e.g. gpt-4o). Probably by the time memory prices come down we will have something as smart as Opus 4.7 that can be run locally.
Bigger models of course have more embedded knowledge, but just knowing that they should make a tool call to do a web search can bypass a lot of that.
That is the sad reality of the future of memory.
Given the current tech, I also doubt there will be practical uses and I hope we’ll see the opposite of what I wrote. But given the current industry, I fully trust them so somehow fill their hardware.
Market history shows us than when the cost of something goes down, we do more with the same amount, not the same thing with less. But I deeply hope to be wrong here and the memory market will relax.
mind that you're quoting marketing material that's largely based on unfair baseline testing (like comparing 4 bit vs 32 bit to get "8x speed")
I hate to mention Jevons paradox as it has become cliche by now, but this is a textbook such scenario
RAM is built on a foundation of sand.
Claude Max subscriptions have gone up, but do you think every Netflix user will pay for one?..
https://www.tomshardware.com/tech-industry/artificial-intell...
the hope is that Ai is "the next semiconductor" and "the next internet"
Not exactly.
LLMs are already quite useful today if you use them as a tool, so they are there to stay. The remaining problem is scalability, a.k.a. how to make LLMs cheap to use.
But scalability is not really a requirement when you look the bigger picture. If smaller software company/projects can't afford to use AI, the bigger ones might just. Eventually they will discover variable use cases for such tech, even if it only serves big firms i.e. defense, resource extraction, war, finance etc.
To the other end, if scalability is achieved, the use of LLM products will be cheaper too, so smaller project can also use them. But of course, if LLM usage is too cheap, then many were-to-be-consumers will just create software projects by themselves at their homes.
I would like a source for that statement. Additionally, I want to know by who? Because it certainly isn't end users. Inflating token usage doesn't make it any more economically viable if your user base, b2b or not, hasn't increased with it. On the contrary, that is a worse scenario for providers.
The recent enterprise revenue numbers of Anthropic
1. As a consultant pretty much every company I have worked with in the last 2 years are doing some kind of in-house "AI Revolution", I'm talking making "AI Taskforce" teams, having weekly internal "AI meetings" and pushing AI everywhere and to everyone. Small companies, SMEs and huge companies. From my observation it is mainly due to C-level being obsessed by the idea that AI will replace/uplift people and revenue will grow by either replacing people or launching features 10x quicker.
2. Did you see software job-boards recently? 9/10 (real) job listings are to do with AI. Either it is fully AI company (99% thin wrapper over Anthropic/OpenAI APIs) or some other SME that needs some AI implementations done. It is truly a breath of fresh air to work for companies that have nothing to do with AI.
The biggest laugh/cry for me are those thin wrappers that go down overnight - think all the "create your website" companies that are now completely useless since Ahtropic cut the middleman and created their own version of exactly that.
I know plenty of engineers being forced to use these tools whether they want to or not. A lot of which are okay with using AI liberally, but don't particularly like generative AI and see it as pretty irresponsible (which feels more true by the week and it is clear from first hand experience). I don't know, there is a huge gradient of users, but I would argue that in previous revolutionary technologies, we didn't have to force people to use a good tool. I didn't have to be forced to use Google search or Google Maps, tech that is now ubiquitous with western society. It seems really suspect that suits have to enforce the use of something that is supposed to change the way we work and be a force multiplier.
C-level strongly believes that AI will fix all these issues. They believe that AI will fix their broken processes.
I see strong resemblance with "Agile Development" ~15 years ago. Extremely hyped, noone asked if their org even is a fit for it or need it, and most importantly - the only way to fix agile is to do more agile. Same with AI right now.
Supposedly AI drives down the cost of producing software,not the "price".
> How are software companies going to make enough revenue to pay for AI, when the amount of money being spent on AI is already multiples of the current total global expenditure on software?
Currently, the cost of AI is between $20/month and around $200/month per developer.
I think the huge billions you're seeing in the news are the investment cost on AI companies, who are burning through cash to invest in compute infrastructure to allow both training and serving users.
> This demand for RAM is built on a foundation of sand, there will be a glut of capacity when it all shakes out.
Who knows? What I know is that I need >64GB of RAM to run local models, and that means most people will need to upgrade from their 8Gb/16GB setup to do the same. Graphics cards follow mostly the same pattern.
You can run huge local models slowly with the weights stored on SSDs.
Nowadays there are many computers that can have e.g. 2 PCIe 5.0 SSDs, which allow a reading throughput of 20 to 30 gigabyte per second, depending on the SSDs (or 1 PCIe 5.0 + 1 PCIe 4.0, for a throughput in the range 15-20 GB/s).
There are still a lot of improvements that can be done to inference back-ends like llama.cpp to reach the inference speed limit determined by the SSD throughput.
It seems that it is possible to reach inference speed in the range from a few seconds per token to a few tokens per second.
That may be too slow for a chat, but it should be good enough for an AI coding assistant, especially if many tasks are batched, so that they can progress simultaneously during a single read pass over the SSD data.
Batching inferences doesn't necessarily help that much since as models get sparser the individual inferences are going to share fewer experts. It does always help wrt. shared routing layers, of course.
Depends how big the models are, how fast you want them to run and how much context you need for your usage. If you're okay with running only smaller models (which are still very capable in general, their main limitation is world knowledge) making very simple inferences at low overall throughput, you can just repurpose the RAM, CPUs/iGPUs and storage in the average setup.
Then again, after many, many years of claims that the following year would be the year of the Linux Desktop, there seems to be more and more of a push into that direction. Or at least into a significant increase in market share. We can thank a current head of state for that.
At a cost of simplicity and beauty. And two lost decades of mediocre performance. Sigh
And hopefully kill Electron.
I have never seen the point of spinning up a 300+Mb app just to display something that ought to need only 500Kb to paint onto the screen.
We're not doing Electron because some popular software also using it. We're doing Electron because the ability to create truly cross-platform interfaces with the web stack is more important to us than 300 MB of user memory.
It's closer to 1GB but trust me, everyone is well aware of your priorities.
May I never have to use or work on your project's software.
Native apps are so poorly optimized that they don't offer any advantage over Electron apps.
I don’t see how design workflows matter in the conversation about cross-platform vs native and RAM efficiency since designers can always write their mockups in HTML/CSS/JS in isolation whenever they like and with any tool of their choice. You could even use purely GUI-based approaches like Figma or Sketch or any photo/vector editor, just tapping buttons and not writing a single line of web frontend code.
Yikes. I spent 15 years developing native on both mobile and desktop. If you think that native has the same design flexibility as HTML/CSS, you're objectively wrong.
By design, each operation system limits you to their particular design language, and styling of components is hidden by the API making forward-compatible customisation impossible. There's no escaping that. And if you acknowledge that fact, you can't then claim native has the same design flexibility as HTML/CSS. If you don't acknowledge that fact, you're unhinged from reality.
There's pros and cons to the two approaches, of course. But that's not what's being debated here.
They do. But not in the way that you think.
I recently switched from Spotify (well known Electron-based app) to Apple Music (well known native app). The move was mostly an ethical one, but I must say, the UI functionality and app features are basically poverty in comparison. One tiny example, navigating from playlist entry to artist requires multiple interactions. This is just one of many frustrations I've had with the app. But hey, it has beautiful liquid glass effects!
In short: iteration time matters. Times from design to implementation, to internal review, to real user feedback, and back to design from each phase should be as fast as possible. You don't get the same velocity as you do in native. Add to that you have to design and implement in quadruplicate, iOS design for iOS, Android for Android, MacOS for Mac, Windows design for windows. All that is why people use Electon.
Anyways, I'm both cases you don't really have to write it twice.
Native to the OS: write only the UI twice, but implement the Core in Rust.
Native to the machine: Write it only once, e.g. in iced, and compile it for every Plattform.
It's bad enough having to run one boated browser, now we have to run multiples?
This is not the right path.
Now that everyone who cant be bothered, vibe codes, and electron apps are the overevangelized norm… People will probably not even worry about writing js and electron will be here to stay. The only way out is to evangelize something else.
Like how half the websites have giant in your face cookie banners and half have minimalist banners. The experience will still suck for the end user because the dev doesnt care and neither do the business leaders.
If a js dev really wanted to it wouldn’t be a huge uphill climb to code a c app because the syntax and concepts are similar enough.
About the only thing they share is curly braces.
This comment makes no sense.
There ought to be a short one-liner that anyone can run to get easily installable "binaries" for their PyQt app for all major platforms. But there isn't, you have to dig up some blog post with 3 config files and a 10 argument incantation and follow it (and every blog post has a different one) when you just wanted to spend 10 minutes writing some code to solve your problem (which is how every good program gets started). So we're stuck with Electron.
and if not?
If the alternative is memory-safe and easy to build, then maybe people will switch. But until it is it's irresponsible to even try to get them to do so.
It likely would use less, and doesn't use a browser for rendering.
> And I'm pretty sure Avalonia is even worse
Definitely not
> The people who hate Electron hate JavaFX just as much if not more
In my opinion, I only see this from people that seem to form all of their opinions on tech forums and think Java=Bad. These are the people that think .NET is still windows only and post FUD because they don't know how to just ask for help.
From what I understand, increasing cache locality is orthogonal to how much RAM an app is using. It just lets the CPU get cache hits more often, so it only relates to throughout.
That might technically offload work to the CPU, but that's work the CPU is actually good at. We want to offload that.
In the case of Electron apps, they use a lot of RAM and that's not to spare the CPU
Cache misses mean CPU stalls, which mean wasted CPU (i.e. the CPU accomplises less than it could have in some amount of time).
> In the case of Electron apps, they use a lot of RAM and that's not to spare the CPU
The question isn't why apps use a lot of RAM, but what the effects of reducing it are. Redcuing memory consumption by a little can be cheap, but if you want to do it by a lot, development and maintenance costs rise and/or CPU costs rise, and both are more expensive than RAM, even at inflated prices.
To get a sense for why you use more CPU when you want to reduce your RAM consumption by a lot, using much less RAM while allowing the program to use the same data means that you're reusing the same memory more frequently, and that takes computational work.
But I agree that on consumer devices you tend to see software that uses a significant portion of RAM and a tiny portion of CPU and that's not a good balance, just as the opposite isn't. The reason is that CPU and RAM are related, and your machine is "spent" when one of them runs out. If a program consumes a lot of CPU, few other programs can run on the machine no matter how much free RAM it has, and if a program consumes a lot of RAM, few other programs can run no matter how much free CPU you have. So programs need to aim for some reasonable balance of the RAM and CPU they're using. Some are inefficient by using too little RAM (compared to the CPU they're using), and some are inefficient by using too little CPU (compared to the RAM they're using).
Yeah, I was saying CPU cache hits would result in better performance. The creator of Zig has argued that the easiest way to improve cache locality is by having smaller working sets of memory to begin with. No, it's not a given this will always work in every case. You can reduce working memory and not have better cache locality. But in a general sense, I understand why he argues for it.
> So programs need to aim for some reasonable balance of the RAM and CPU they're using
I agree with this, but
> but if you want to do it by a lot, development and maintenance costs rise and/or CPU costs rise, and both are more expensive than RAM, even at inflated prices
I would like you to clarify further, because saying CPU costs are more expensive than RAM costs is a bit misleading. A CPU might literally cost more than RAM, but a CPU is remarkably faster, and for work done, much cheaper and more efficient, especially with cache hits.
You had originally said
> It could be effective in some specific situations, but I would definitely not say that those situations are more common than the other ones
This is what I'm confused on. Why do you think most cases wouldn't benefit from this? Almost every app I've used is way on one end of the spectrum with regards to memory consumption vs CPU cycles. Don't you think there are actually a lot of cases where we could reduce memory usage AND increase cache locality, fitting more data into cache lines, avoiding GC pressure, avoiding paging and allocations, and the software would 100% be faster?
Andrew is not wrong, but he's talking about optimisations with relatively little impact compared to others and is addressing people who already write software that's otherwise optimised. More concretely, keeping data packed tighter and reducing RAM footprint are not the same. The former does help CPU utilisation but doesn't make as big of an impact on the latter as things that are detrimental to the CPU (such as switching from moving collectors to malloc/free).
> Why do you think most cases wouldn't benefit from this?
The context to which "this" is referring to was "Reducing your RAM consumption is not the best approach to reducing your RAM throughput is my point." For data-packing, Andy Kelley style, to reduce the RAM bandwidth, the access patterns must be very regular, such as processing some large data structure in bulk (where prefetching helps). This is something you could see in batch applications (such as compilers), but not in most programs, which are interactive. If your data access patterns are random, packing it more tightly will not significantly reduce your RAM bandwidth.
Ton of software out there where optimisation of both memory and cpu has been pushed to the side because development hours is more costly than a bit of extra resource usage.
Pressure to optimize can more often imply just setting aside work to make the program be nearer to being limited by algorithmic bounds rather than doing what was quickest to implement and not caring about any of it. Having the same amount of time, replacing bloated abstractions with something more lightweight overall usually nets more memory gains than trying to tune something heavy to use less RAM at the expense of more CPU.
Of course memory safety has a quality all its own.
Whatever little CPU they waste is often worth more than the RAM they save.
> For cases where they are we've got stuff like arena allocators.
... that work by using more RAM to save on CPU.
Far less for moving collectors. That's why they're used: to reduce the overhead of malloc/free based memory management. The whole point of moving collectors is that they can make the CPU cost of memory management arbitrarily low, even lower than stack allocation. In practice it's more complicated, but the principle stands.
The reason some programs "avoid the heap like the plague" is because their memory management is CPU-inefficient (as in the case of malloc/free allocators).
> Meanwhile I'm not sure where you got this idea about the value of CPU cycles relative to RAM
There is a fundamental relationship between CPU and RAM. As we learn in basic complexity theory, the power of what can be computed depends on how much memory an algorithm can use. On the flip side, using memory and managing memory requires CPU.
To get the most basic intuition, let's look at an extreme example. Consider a machine with 1 GB of free RAM and two programs that compute the same thing and consume 100% CPU for their duration. One uses 80MB of RAM and runs for 100s; the other uses 800MB of RAM and runs for 99s (perhaps thanks to a moving collector). Which is more efficient? It may seem that we need to compare the value of 1% CPU reduction vs a 10x increase in RAM consumption, but that's not necessary. The second program is more efficient. Why? Because when a program consumes 100% of the CPU, no other program can make use of any RAM, and so both programs effectively capture all 1GB, only the second program captures it for one second less.
This scales even to cases when the CPU consumption is less than 100% CPU, as the important thing to realise is that the two resources are coupled. The thing that needs to be optimised isn't CPU and RAM separately, but the RAM/CPU ratio. A program can be less efficient by using too little RAM if using more RAM can reduce its CPU consumption to get the right ratio (e.g. by using a moving collector) and vice versa.
Anyway I'm not at all inclined to blindly believe your claim that malloc/free is particularly expensive relative to various GC algorithms. At present I believe the opposite (that malloc/free is quite cheap) but I'm open to the possibility that I'm misinformed about that. You're going to need to link to reputable benchmarks if you expect me to accept the efficiency claim, but even then that wouldn't convince me that any extra CPU cycles were actually an issue for the reasons articulated in the preceding paragraph.
In the young generation, few objects survive and so few are moved (the very few that survive longer are moved into the old gen); in the old generation, most objects survive, but the allocation rate is so low that moving them is rare (although the memory management technique in the old gen doesn't matter as much precisely because the allocation rate is so low, so whether you want a moving algorithm or not in the old gen is less about speed and more about other concerns).
On top of that, the general principle of moving collectors (and why in theory they're cheaper than stack allocation) is that the cost of the overall work of moving memory is roughly constant for a specific workload, but its frequency can be made as low as you want by using more RAM.
The reason moving collectors are used in the first place is to reduce the high overhead of malloc/free allocators.
Anyway, the general point I was making above is that a machine is exhausted not when both CPU and RAM are exhausted, but when one of them is. Efficient hardware utilisation is when the program strikes some good balance between them. There's not much point to reducing RAM footprint when CPU utilisation is high or reducing CPU consumption when RAM consumption is high. Using much of one and little of the other is wasteful when you can reduce the higher one by increasing the other. Moving collectors give you a convenient knob to do that: if a program consumes a lot of CPU and little RAM, you can increase the heap and turn some RAM into CPU and vice versa.
[0] https://techwireasia.com/2026/04/chinese-memory-chips-ymtc-c...
>CXMT still trails Samsung, SK Hynix, and Micron by approximately three years in advanced DRAM node development, and yield rates on new production lines remain the variable that determines whether capacity targets translate into reliable supply. Liu notes that lines launched in the second half of 2026 are unlikely to change the global supply-demand balance until 2027.
The Verge article talks about demand exceeding supply in 2028. Your article suggests it'll take until 2029 before Chinese production catches up to current technology.
It'll help drive prices down in five yearss, but the Chinese memory production won't be ready and efficient enough to prevent the shortages from continuing to grow.
Then, mostly by chance, I saw that my local Microcenter had some pre-builts for sale, and I ended up picking one up for <$5k that had "best in slot" components across the board, including a 5090 and even a high-end power supply.
The last time I built a gaming PC was upwards of a decade ago, and at that time the prevailing wisdom was to never buy a pre-built unless you had a massive amount of disposable income and couldn't spare even just one weekend to dedicate to a hobby project that could benefit you for years. Now, it was absolutely a no-brainer.
I'm struggling to put this in context. For comparison, what was your budget for refreshing the pc you had? Were the planned upgrades going to exceed $5k at current prices? Or is the situation that a pre-build machine with far better components was now only marginally more?
Or is it that pre-built gaming PCs have stopped being a joke? I had the experience building a bicycle: I was certain I was taking the frugal path sourcing each component individually and putting it together myself. At the end I was horrified to realize I spent far more than a new bike with superior components. It was pointed out that bicycle makers are buying by the pallet and will beat diy every time — so long as they're building something I want to buy.
That's still the case, and always will be — with a pre-built you're at the very least paying for someone to assemble it for you, so it's always going to be more expensive as a baseline.
Beyond that, the chance they've chosen good components and haven't tried to screw you over on less flashy ones like the motherboard and power supply is low.
That's not to say it's literally impossible to ever find a good deal. You very well might have. Doesn't change anything though.
Except isn't it possible that pre-built companies actually get better deals on hardware bought in bulk, and therefore could offset the labor costs with cheaper materials?
Hardware pricing and availabilty pre-COVID was pretty predictable and stable, which meant the consumer could extract a meaningful cost advantage if they were willing to do the relatively modest amount of work of sourcing components individually and personally assembling the build. Right now, though, some places like Microcenter appear to have a cost advantage that fundamentally relies on market and pricing instability and can only be achieved through deeper integration with the supply chain and bulk purchasing in advance -- something a retailer like Microcenter can do, but I personally cannot.
Basically, the optimizing that can happen is that I ditch heavy tools in favour of lighter ones, and hopefully enough other people do the same to help lighter tools with finances/dev resources.
If I look at the Activity Manager in macOS, of apps that are less trashy but currently taking up a lot of memory, they mostly aren't apps that I'm willing/able to move away from to save on resource use: Firefox, Safari, 1Password. (For browsers, you can blame poorly optimized websites for a lot of it, but I just don't see anyone rushing to create lightweight clones of websites in order to save users' RAM.)
But software optimisation helps all hardware and that doesnt drive sales.
Linux however, they dont have to worry about that. Maybe it is finally the era of Haiku OS as the ghost of BeOS rises!
Assuming China takes TSMC in one piece (unlikely without internal sabotage in the best case scenario), it would still probably take years before it produces another high end GPU or CPU.
We would probably be stuck with the existing inventory of equipment for a long time…
The risk with China taking over Taiwan is that they mostly expedite their own production research by a couple of years.
Have you seen how many states and countries look enviously at Silicon Valley’s tech companies, China’s manufacturing dominance, or London’s financial sector and try to replicate them?
Turns out it’s way harder than you’d expect.
Hell, Intel can’t match TSMC despite decades of expertise, much greater fame, and regulators happy to change the law and hand out tens of billions in subsidies.
Anyone trying to spin up a competitor to TSMC would have to first overcome a significant financial hurdle: the capital investment to build all the industrial equipment needed for fabrication.
Then they'd have to convince institutions to choose them over TSMC when they're unproven, and likely objectively worse than TSMC, given that they would not have its decades of experience and process optimization.
This would be mitigated somewhat if our institutions had common-sense rules in place requiring multiple vendors for every part of their supply chain—note, not just "multiple bids, leading to picking a single vendor" but "multiple vendors actively supplying them at all times". But our system prioritizes efficiency over resiliency.
A wealthy nation-state with a sufficiently motivated voter base could certainly build up a meaningful competitor to TSMC over the course of, say, a decade or two (or three...). But it would require sustained investment at all levels—and not just investment in the simple financial sense; it requires people investing their time in education and research. Dedicating their lives to making the best chips in the world. And the only reason that would work is that it defies our system, and chooses to invest in plants that won't be finished for years, and then pay for chips that they know are inferior in quality, because they're our chips, and paying for them when they're lower quality is the only way to get them to be the best chips in the world.
They have the other system.
> A wealthy nation-state with a sufficiently motivated voter base could certainly build up a meaningful competitor to TSMC over the course of, say, a decade or two (or three...).
Demand increased, everyone built new fabs, then prices dropped and they couldn't pay off their investments. Many went out of business. It happened in the 80s, it happened in the 90s, it happened in the 2000s.
Now there's only three manufacturers left, and they know very well that demand for their product tends to be cyclical.
I've been in the industry for 30 years and I've worked at companies with fabs were demand was high and customers would only get 30% of what they ordered. Then just 2 years later our fab was only running at 50% capacity and losing money. It takes about $20 billion and 3-4 years to make a modern new fab. If you think that AI is a bubble then do you want to be left with a shiny new factory and no products to sell because demand has collapsed?
The lawsuits in the past prove that statement to not be basically but actually.
From now on, RAM will always be super costly for consumers, because they can't make massive deals like Apple/OpenAI/etc. We are the bagholders.
Now it's high again, but give it a couple years and it'll once again crash.
Have they really ever been cheap? Also Tesla 3 is cheaper now, Yaris is still cheap as well.
even if gaming is and will remain very popular for years, it and the desire to upgrade gaming rigs is still a discretionary activity with more price elasticity of demand than corporate uses for RAM in the dawn of the AI age. gamers live on the margin of this market, where low prices will stimulate upgrades and high prices will lead to holding out. The complaints about price are real, but that segment of the market is some combination of less large and less important.
letting the market set prices ensures that the chips go to the critical markets and uses. less critical uses will not allocate funds for purchases.
Can you please elaborate what you mean by "critical market"?
Edit: formatting
Everybody’s getting pinched, not just the gamers.
At the moment, nothing is certain. Could this last? Sure. Could it not last? Yup.
the current relative spike in the prices misses the medium-term trend of the vast decrease in memory price post-covid that led to the recent surge. the cartel got another opportunity to make bank and they will use that lever to the max.
funnily enough i've been personally stuck with 16 gigs since 2015, across three memory generations! but i am used to the past when you would spend 80-100 on an 8gb stick (jdec timings, nothing fancy but from a major brand) without accounting for inflation.
All computers in my household are 8+ years old.
so it's 5x as expensive as Opus then.
Think I will scrap my PC and sell its parts.
I wonder if there are any niche companies building decent rigs with DDR3 and 5/6th generation Intel CPUs out there, it is cheap and might be a business opportunity?
There's a future where RAM makers tool up for this massively increased demand, then the AI companies go broke as the bubble bursts, so RAM is cheap as. So laptop manufacturers get on that and start making laptops with 1TB+ memory so we can run decent LLMs on the local machine. Everyone happy :)
I don't want to pay more because of AI companies driving the price up. That is milking.
Another thing I've been thinking about is what happens when the next generation of NVidia chips comes out? I suspect NVidia is going to delay this to milk the current demand but at some point you'll be able to buy something that's better than the H100 or B200 or whatever the current state-of-the-art for half the price. And what's that going to do to the trillions in AI DC investment?
I'm interested when the next bump in DRAM chip density is coming. That's going to change things although it seems like much of production has moved from consumer DRAM chips to HBM chips. So maybe that won't help at all.
I do think that companies will start seeing little ot no return from billions spent on AI and that's going to be aproblem. I also think that the hudnreds of billions of capital expenditure of OpenAI is going to come crashing down as there just isn't any even theoretical future revenue that can pay for all that.
They'll just spend whatever they were planning to spend and get more performance.
We have RAM shortage now, we will have very cheap RAM tomorrow. It’s not like production is bottlenecked by raw materials. Chip companies just need to assess if the demand by AI companies will last so it’s better to scale up, or perhaps they should wait it out instead of oversupplying and cutting into their profits.
There are two RAM suppliers...
I cannot stand how you and people like you try to justify everything by supply and demand. Also you act like it's some natural law of nature. It's not a law of nature- if you took an economics class you would realize it's try to maximize PROFIT. It's not for the good of the people.
All of these things are a CHOICE that people are making to now completely screw the average person for, again, the needs of big corporations and the top 0.01%.