> Before I get to what Microsoft's revenue will look like, there's only one governor in all of this. This is where we get a little bit ahead of ourselves with all this AGI hype. Remember the developed world, which is what? 2% growth and if you adjust for inflation it’s zero?
> So in 2025, as we sit here, I'm not an economist, at least I look at it and say we have a real growth challenge. So, the first thing that we all have to do is, when we say this is like the Industrial Revolution, let's have that Industrial Revolution type of growth.
> That means to me, 10%, 7%, developed world, inflation-adjusted, growing at 5%. That's the real marker. It can't just be supply-side.
> In fact that’s the thing, a lot of people are writing about it, and I'm glad they are, which is the big winners here are not going to be tech companies. The winners are going to be the broader industry that uses this commodity that, by the way, is abundant. Suddenly productivity goes up and the economy is growing at a faster rate. When that happens, we'll be fine as an industry.
> But that's to me the moment... us self-claiming some AGI milestone, that's just nonsensical benchmark hacking to me. The real benchmark is: the world growing at 10%.
https://reclaimtheamericandream.org/2016/03/campaign-2016-th...
In the USA, globalization boosted aggregate measures but it traded exports, which employed middle / lower class Americans, for capital inflows, which didn't. On average it was brilliant, in median it was a tragedy. There were left wing plans and right wing plans to address this problem (tax and spend vs trickle down) but the experiment has been run and they didn't deliver. If you want the more fleshed-out argument backed by data and an actual economist, read "Trade Wars are Class Wars" by Michael Pettis.
Notably, solving this problem isn't as simple as returning to mercantilism: China is the mercantilist inverse of the neoliberal USA in this drama, but they have a different set of policies to keep the poor in line and arguably manage it better than the USA. The common thread that links the mirror policies is the thesis and title of the book I mentioned: trade wars are class wars.
But returning to AI, it has very obvious implications on the balance between labor and capital. If it achieves anything close to its vision, capital pumps to the moon and labor gets thrown in the ditch. That's you and I and everyone we care about. Not a nice thought.
The vast majority of voters would vote for the shittier economy, which in fact is what we are seeing right now? And who ends up being the winners? Presumably dictatorships that can sink capital into things like automation while simply not giving a shit about the desires of the citizenry...
There's no "going back" to Bretton Woods, its design failed, it wasn't just a political decision. Generally though, what I don't like about Hacker News is that when you reply to something that makes space for fringe or pithy explanations for things, it's impossible to sound grounded, so people interested in this can start with a succinct article here: https://www.riksbank.se/en-gb/about-the-riksbank/history/his...
I work in the latter (I'm the CTO of a small business), and here's how our deployment story is going right now:
- At user level: Some employees use it very often for producing research and reports. I use it like mad for anything and everything from technical research, solution design, to coding.
- At systems level: We have some promising near-term use cases in tasks that could otherwise be done through more traditional text AI techniques (NLU and NLP), involving primarily transcription, extraction and synthesis.
- Longer term stuff may include text-to-SQL to "democratize" analytics, semantic search, research agents, coding agents (as a business that doesn't yet have the resources to hire FTE programmers, I would kill for this). Tech feels very green on all these fronts.
The present and neart-term stuff is fantastic in its own right - the company is definitely more productive, and I can see us reaping compound benefits in years to come - but somehow it still feels like a far cry from the type of changes that would cause 10% growth in the entire economy, for sustained periods of time...
Obviously this is a narrow and anecdotal view, but every time I ask what earth-shattering stuff others are doing, I get pretty lukewarm responses, and everything in the news and my research points in the same direction.
I'd love to hear your takes on how the tech could bring about a new Industrial Revolution.
The economic growth would then come from every business having access to a limitless supply of tireless, cheap, highly intelligent knowledge workers
Don’t allow the “wow!” factor of the novelty of LLMs cloud your judgement. Today’s models are very noticeably smarter, faster, and overall more useful.
I’ve had a few toy problems that I’ve fed to various models since GPT 3 and the difference in output quality is stark.
Just yesterday I was demonstrating to a colleague that both o3 mini and Gemini Flash Thinking can solve a fairly esoteric coding problem.
That same problem went from multiple failed attempts that needed to be manually stitched together - just six months ago — to 3 out of 5 responses being valid and only 5% of output lines needing light touch ups.
That’s huge.
PS: It’s a common statistical error to conflate success rate with negative error rate. Going from 99% success to 99.9% is not 1% better, it’s 10x better! Most AI benchmarks are still reporting success rate, but ought to start focusing on the error rate soon to avoid underselling their capabilities.
The core technology is becoming commoditized. The ability to scale is also becoming more and more commoditized by the day. Now we have the capability to truly synthesize the world's biomedical literature and combine it with technologies like single cell sequencing to deliver on some really amazing pharmaceutical advances over the next few years.
You wouldn't choose to back to the prior time and same will be true with this revolution.
As much as many people hate on "gig" economy, the fact remains that most of these people would be worse off without driving Uber or delivering with DoorDash (and for example, they don't care about the depreciation as much as those of us with the means to care about such things do).
I find Uber, DD, etc. to be valuable to my day to day life. I tip my delivery person like 8 bucks, and they're making more money than they would doing some min wage job. They need their car anyway, and speaking with some folks who only know Spanish in SF, they're happy to put $3k on their moped and make 200-250+ a day. That's really not that bad, if you actually care to speak with them and understand their circumstance.
Not everyone can be a self taught SWE, or entrepreneur, or perform surgery. And lots can't even do so-called "basic" jobs in an office for various reasons.
Put people to work, instead of out of work.
Current hype is also so terrible. AGENTS. AGENTS EVERYWHERE. Except they don't work most of the time and by the time you realize it isn't working you've already spent $20. 100k people do the same thing, company reports 2M x 12 = 24 million ARR UNLOCKED!!!!!! And raises another round of funding...
> Are people really so dense they don't see this obvious marketing shift?
I haven't noticed any shift from AGI to ASI, or either used in marketing.
The steelman would be 'but Amodei/Altman do mention in interviews 'oh just wait for 2027' or 'this year we'll see AI employees'
However, that is far afield from being used in marketing, quite far afield from an "obvious marketing shift", and worlds away from such an obvious marketing shift that it's worth calling your readers dense if they don't "see" it.
It's also not even wrong, in the Pauli sense, in that: what, exactly, would be the marketing benefit of "shifting from AGI to ASI"? Both imply human replacement.
> As much as many people hate on "gig" economy
Is this relevant?
> most of these people would be worse off without driving Uber or delivering with DoorDash
Do people who hate on the gig economy think gig economy employees would be better off without gig economy jobs?
Given the well-worn tracks of history, do we think that these things are zero sum, where if you preserve jobs that could be automated, that keeps people better off, because otherwise they would never have a job?
> ...lots more delivery service stuff...
?
> Current hype is also so terrible. AGENTS. AGENTS EVERYWHERE. Except they don't work most of the time and by the time you realize it isn't working you've already spent $20. 100k people do the same thing, company reports 2M x 12 = 24 million ARR UNLOCKED!!!!!! And raises another round of funding...
I hate buzzwords too, I'm stunned how many people took their not-working thing and relaunched it as an "agent" that still doesn't work.
But this is a hell of a strawman.
If the idea is 100K people try it, and cancel after one month, which means they're getting 100K new suckers every month to replace the old ones...I'd tell you that its safe to assume there's more that goes into getting an investor check than "whats your ARR claim?" --- here, they'd certainly see the churn.
As far as hating on gig economy, that pot has been stirring in California quite a bit (prop 22, labor law discussions, etc.). I think many people (IMO, mostly from positions of privilege) make assumptions on gig workers' behalf and bad ideas sometimes balloon out of proportion.
Also, just from my experience as a gold miner who moved out here to SF and being around founders, I've learned that lies, and a damn lot of lies, are more common than I thought they'd be. Quite surprising, but hey I guess quite a non-insignificant number of people are too busy fooling the King that it is actually real gold! And there are a lot of Kings these days.
edit: ESL lol
Here are some examples from playing with Grok 3. My test query was, "What is the name of a Magic: The Gathering card that has all five vowels in it, each occurring exactly once, and the vowels appear in alphabetic order?" The motivation here is that this seems like a hard question to just one-shot, but given sufficient ability to continue recalling different card names, it's very easy to do guess-and-check. (For those interested, valid answers include "Scavenging Ghoul", "Angelic Chorus" and others)
In one attempt, Grok 3 spends 10 minutes (!!) repeatedly checking whether "Abian, Luvion Usurper" satisfies the criteria. It'll list out the vowels, conclude it doesn't match, and then go, "Wait, but let's think differently. Maybe the card is "Abian, Luvion Usurper," but no", and just produce variants of that thinking. Counting occurences of the word "Abian" suggests it tested this theory 800 times before eventually timing out (or otherwise breaking), presumably just because the site got overloaded.
In a second attempt, it decides to check "Our Market Research Shows That Players Like Really Long Card Names So We Made this Card to Have the Absolute Longest Card Name Ever Elemental" (this a real card from a joke set). It attempts to write out the vowels:
>but let's check its vowels: O, U, A, E, E, A, E, A, E, I, E, A, E, O, A, E, A, E, O, A, E, E, E, A, E, O, A, E, E, E, A, E, O, A, E, E, E, A, E, O, A, E ...
It continues like this for about 600 more vowels, before emitting a random Russian(?) word and breaking out:
>...E, O, A, E, E, E, A, E, O, A, E, E, E, A, E, O продуктив
These two examples seem like the sort of failures LeCun conjectured. The model gets into a cycle self-reinforced unproductive behavior. Every time it checks Abian, or emits another "AEEEAO", it becomes even more probable that the next tokens should be the same.
Like, I'm as sceptical of just assuming "line goes up" extrapolation of performance as much as anyone, but assuming that current flaws are going to continue being flaws seems equally wrong-headed/overconfident. The past 5 years or so has been a constant trail of these predictions being wrong (remember when people thought artists would be safe cos clearly AI just can't do hands?). Now that everyone's woken up to this RL approach we're probably going to see very quickly over the next couple years how much these issues hold up
(Really like the problem though, seems like a great test)
And to the models' credit, they do start off with a valid guess-and-check process. They list cards, write out the vowels, and see whether it fits the criteria. But eventually they tend to go off the rails in a way that is worrying.
This is a bit unfair to Waymo as it is near-fully commercial in cities like Los Angeles. There is no human driver in your hailed ride.
> But this has turned out to be wrong. A few new AI systems (notably OpenAI o1/o3 line and Deepseek R1) contradict this theory. They are autoregressive language models, but actually get better by generating longer outputs:
The arrow of causality is flipped here. Longer outputs does not make a model better. A better model can output a longer output without being derailed. The referenced graph from DeepSeek doesn't prove anything the author claims. Considering that this argument is one of the key points of the article, this logical error is a serious one.
> He presents this problem of compounding errors as a critical flaw in language models themselves, something that can’t be overcome without switching away from the current autoregressive paradigm.
LeCun is a bit reductive here (understandably as it was a talk for a live audience). Indeed, autoregressive algorithms can go astray as previous errors do not get corrected, or worse yet, accumulate. However, an LLM is not autoregressive in the customary sense that it is not like a streaming algorithm (O(n)) used in time series forecasting. LLMs have have attention mechanisms and large context windows, making the algorithm at least quadratic, depending on the implementation. In other words, LLM can backtrack if the current path is off and start afresh from a previous point its choice, not just the last output. So, yes, the author is making a valid point here, but technical details were missing. On a minor note, the non-error probability in LeCunn's slide actually shows non-autoregressive assumption. He seems to be contradicting himself in the very same slide.
I actually agree with the author on the overacrhing thesis. There is almost a fetishization of AGI and humanoid robots. There are plenty of interesting applications well before having those things accomplished. The correct focus, IMO, should be measurable economic benefits, not sci-fi terms (although I concede these grandiose visions can be beneficial for fundraising!).
They also report disengagements in California periodically; here's data: https://www.dmv.ca.gov/portal/vehicle-industry-services/auto...
and an article about it: https://thelastdriverlicenseholder.com/2025/02/03/2024-disen...
> Now self driving cars means that there is no one in the drivers seat, but there may well be, and in all cases so far deployed, humans monitoring those cars from a remote location, and occasionally sending control inputs to the cars. The companies do not advertise this feature out loud too much, but they do acknowledge it, and the reports are that it happens somewhere between every one to two miles traveled
My point stands: Waymo has been technically successful and commercially viable at least thus far (though long term amortized profitability remains to be seen). To characterize it as a hype or vaporware of AGIers is a tad unfair to Waymo. Your point of high-latency "fleet response" by Waymo only proves my point: it is now technically feasible to remove the immediate-response driver and have the car managed by high-latency remote guidance only occasionally.
In the previous paragraph, the author makes the case for why Lecun was wrong with the example of reasoning models. Yet, in the next paragraph, this assertion is made which is just a paraphrasing of Yecun's original assertion. Which the author himself says is wrong.
>> Instead of waiting for FAA (fully-autonomous agents) we should understand that this is a continuum, and we’re consistently increasing the amount of useful work AIs
Yes! But this work is already well underway. There is no magic threshold for AGI - instead the characterization is based on what percentile of the human population the AI can beat. One way to characterize AGI in this manner is "99.99% percentile at every (digital?) activity".
This is a subtle point that may have not come across clearly enough in my original writing. A lot of folks were saying that the DeepSeek finding that longer chains of thought can produce higher-quality outputs contradicts Yann's thesis overall. But I don't think so.
It's true that models like R1 can correct small mistakes. But in the limit of tokens generated, the chance that they generate the correct answer still decays to zero.
There was a paper not too long ago which illuminated that reasoning models will increase their response length more or less indefinitely toward solving a problem, but the return from doing so asymptotes toward zero. My apologies for missing a link.
>> But in the limit of tokens generated, the chance that they generate the correct answer still decays to zero.
I don't understand this assertion though.
Lecun's thesis was errors just accumulate.
Reasoning models accumulate errors, track back and are able to reduce it back down.
Hence the hypothesis of errors accumulating (at least asymptotically) is false.
What is the difference between "Probability of correct answer decaying to zero" and "Errors keep accumulating" ?
That's all good, but the question remains: to whom will that economic value be delivered when the primary technology we have for distributing economic value - human employment - will be in lower supply once the "good enough" AIs multiply the productivity of the humans with the jobs.
If there is no plan for that, we have bigger problems ahead.
That said, I'm pretty sure we're a long way from building equally-competent diffusion-based base models, let alone reasoning models.
If anyone's interested in this topic, here are some more foundational papers to take a look at:
- Simple and Effective Masked Diffusion Language Models [2024] (https://arxiv.org/abs/2406.07524)
- Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [2023] (https://arxiv.org/abs/2310.16834)
- Diffusion-LM Improves Controllable Text Generation [2022] (https://arxiv.org/abs/2205.14217)
Is why all tracks made with Udio, Suno have this weird noise creep in the more the song goes on? You can try it by comparing the start and the end of the song - even if it was the exact same beat and instruments, you can hear a difference in amount of noise (and the noise profile imo is unique to AI models).
1. 2005 science fiction novel by Charles Stross
I agree with this premise. The second dimension being how much effort to you have to put in that input. Inputs effort needed at each intervention can vary widely and that has to be accounted for.
I don’t think current LLM behavior is necessarily due to self-correction, but more due to availability of internet-scale data, but I know that reasoning models are building towards self-correction. The problem, I think, is that even reasoning models are rote because they lack information synthesis, which in biological organisms comes from the interplay between short-term and long-term memories. I am looking forward to LLMs which surpass rote and mechanical answer and reasoning capabilities.
When we're awake we have continual inputs from the outside world, these inputs help us keep our mental model of the world accurate to the world, since we're constantly observing the world.
Could it be that LLMs are essentially just dreaming? Could we add real-world inputs continually to allow them to "wake up"? I suspect more is needed, the separate training & inference phases of LLMs are quite unlike how humans work.
Similarly, a lot of cognitive tasks become much more difficult without the ability to recombinate with sensory data. Blindfold chess. Mental mathematics.
Whatever it is that sleep does to us, agents are not yet capable of it.
That seems like a poor argument. Each word a human utters also has a chance of being wrong, yet somehow we have been successful overall.
I think our agreement ends if we consider long-running tasks. A human can work a long time on a task like "find a way to make money". An AI, on the other hand, as it gets further and further from human input into autoregressive territory, is more and more likely to become "stuck" on a road to nowhere and needs human intervention to get unstuck.
The degree to which I care about that has not.
It just means we can get better inference with less targeted models. Whoopdy doo
In natural language, many strings are equally valid. there are many ways to chain tokens together to get the 'correct' answer to an in sample prompt. A model with perfect loss will then for ambiguous sequences of tokens, produce a likelihood over the next tokens that corresponds to number valid token paths in the given corpus given the next token.
Compounding errors can certainly happen, but for many things upstream of the key tokens its irrelevant. There are so many ways to phrase things that are equally correct- I mean this is how language evolved (and continues to). Getting back to my first point, if you assume you have a LLM with perfect loss on the training dataset, you still can get garbage back at test time- thus i'm not sure thinking about 'compounding errors' is useful.
Errors in LLM reasoning I suspect are more closely related to noisy training data or an overabundance of low quality training data. I've observed this in how all the reasoning LLMs work, given things that are less common in the corpus of (the internet and digital assets) and require higher order reasoning, they tend to fail. Whereas these advanced math or programming problems tend to go a bit better, input data is likely much cleaner.
But for something like: how do I change the fixture on this light, I'll get back some kind of garbage from the SEO-verse. IMO next step for LLMs is figuring out how to curate an extremely high quality dataset at scale.
> The finding that language models can get better by generating longer outputs directly contradicts Yann’s hypothesis.
The author's examples show that the error has been minimized for a few examples of a certain length. This doesn't contradict Lecun, afaict.
p(x_n | x_1 ... x_{n-1})
which means that each token depends on all the previous tokens. Attention is one way to parameterize this. Yann's not talking about Markov chains, he's talking about all autoregressive models.I would think LeCun was aware of that. Also prior sequence to sequence models like RNNs have already incorporated information about the further past.
When we've achieved AGI, it should have the capability to make its own determination. I'm not saying we should build it with that capability. I'm saying that capability is necessary to have what I would consider to be AGI. But that would mean it is, by definition, outside of our control. If it doesn't want to do the thing, it will not do the thing.
People seem to want an expert slave. Someone with all of the technical chops to achieve a thing, but will do exactly what they're told.
And I don't think we'll ever get there.
It basically uses the image generation approach of progressively refining the entire thing at once, but applied to text. It can self-correct mid-process.
The blog post where I found it originally that goes into more detail and raises some issues with it: https://timkellogg.me/blog/2025/02/17/diffusion
One potential difference between autoregressive and non-autoregressive models is the types of failures which occur. Eg, typical failures in autoregressive models might look like spiralling off into nonsense once the first "error" is made, while non-autoregressive models might produce failures that tend to remain relatively "close" to the true data.
as a result of the whole learning process the toddler in particular learns how to self-correct itself, ie. as a grown up s/he knows, without much trial and errors anymore, how to continue in straight line if the previous step went sideways for whatever reason
>An LLM using autoregressive inference can only compound errors.
That is pretty powerful statement completely dismissing that some self-correction may be emerging there.
The metric may be including say a weight/density of the attracting facts cluster - somewhat like gravitation drives the stuff in the Universe with the LLM learning can be thought as pre-distributing matter in its own that very high-dimensional Universe according to the semantic "gravitational" field.
The resulting - emerging - metric and associated geometry is currently mind-boggling incomprehensible, and even in much-much simpler, single-digit dimensional, spaces systems described by Lecun still can be [quasi]stable and/or [quasi]periodic around say some attractor(s).
(This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.)
> is it simply the case that we can't be outwardly critical of AI itself at all anymore
You need only, er, delve into any large HN thread about AI to see that this is very far from the case! especially the more generic threads about opinion pieces and so on.
I think the air on HN is too cynical and curmudgeonly towards new tech right now, and that worries me. Not that healthy skepticism is unwarranted (it's fine of course) but for HN itself to be healthy, there ought to be more of a balance. Cranky comments about "slop"* ought not to be the main staple here—what we want is curious conversation about interesting things—but right now it's not only the main staple, I feel like we're eating it for breakfast, lunch, and dinner.
But I'm not immune from the bias I'm forever pointing out to other people (https://news.ycombinator.com/item?id=43134194), and that's probably why we have opposite perceptions of this!
* (yes it annoys me too, that's not my point here though)