In part, he's arguing that LLMs are not the most efficient path toward intelligence and that some other design will do better, which is probably true, but no one has pushed model size for the (somewhat ironically named) Gato-style multimodal embodied transformers that I think would result in something closer to cat intelligence.
I am reasonably certain that further step changes will happen as LLM model and/or data sizes increase. Right now we're achieving a lot of SOTA performance with somewhat smaller models and multimodal pretraining, but not putting those same methods to work in training even larger models on more data.
Therefore, while it can generate new strategies to approach a problem, the implementation of each step within the strategy is limited by a probabilistic approach.
Contrast that with AlphaZero, which can come up with strategies that are 100% unique (in theory), since it isn't constrained by any human training.
I think O1 is a step forward, but not a massive leap in technology.
I wouldn't trust it about any subject I don't dominate so it feels kinda pointless to use it at all.
Thus, while it leads to improved reasoning vs prior art, it is not yet bootstrapping. It may be useful in constrained fields like coming up with proofs of theorems, though.
He's very abrasive in his conduct but don't mistake it for incompetence.
Even the "AI can't do video" thing was blown out and misquoted because discrediting people and causing controversy fuels more engagement.
He actually said something along the lines of it "not being able to do it properly" / everything he argues is valid from a scientific perspective.
The joint embeddings work he keeps professing has merit.
---
I think the real problem is that from a consumer perspective, if the model can answer all their questions it must be intelligent / from a scientist's perspective it's not capable for the set of all consumers so it's not intelligent.
So we end up with a dual perspective where both are correct due to technical miscommunication and misunderstanding.
Indeed. It seems to me that he has a type of personality common in skilled engineers. He is competent and makes informed decisions, but does not necessarily explain them well (or at all if they feel trivial enough), is certain of his logic (which often sounds like arrogance), and does not seem to have much patience.
He is great at what he does but he really is not a spokesman. His technical insights are often very interesting, though.
This is particularly problematic when dealing with concepts that aren't fully understood, even by their originators, as these might contain valuable insights despite their apparent strangeness. The risk of losing out on potentially groundbreaking perspectives must be weighed against the importance of scientific integrity. To address this, the technical community could benefit from fostering more open dialogue, encouraging interdisciplinary collaboration, and creating spaces where speculative ideas can be explored without immediate judgment.
Usually doesn't happen here.
So, people just talk past each other, with everyone using a different method on how a complex trait like "intelligence" is to be collapsed down to a scalar for easy comparison.
But we do understand vision and hearing. We have for over 50 years. We've implemented them classically and described them with physics. Game engines, graphics pipelines, synthesizers, codecs, compression, digital image processing, ... the field is vast and productive.
Our mastery over signals is why I'm so bullish on diffusion and AI for images, video, and audio regardless with whatever happens with LLMs.
And if this tech cycle only improves our audio-visual experience and makes games and film more accessible, it'll still be revolutionary and a step function improvement over what came before.
His framing of intelligence is one thing. The people who disagree with him are framing intelligence a different way.
End of story.
I wish that all the energy went towards substantive disagreements rather than disagreements that are mostly (not entirely) rooted in semantics and definitions.
What he's saying is that he thinks the current techniques for AI (e.g., LLMs) are near the limits of what you can achieve with such techniques and are thus a dead-end for future research; also consequently, hyperventilation about AI superintelligence and the like is extremely irresponsible. It's actually a substantial critique of AI today in its actual details, albeit one modulated by popular press reporting that's dumbing it down for popular consumption.
ANNs are extremely useful tools because they can process all sorts of information humans find useful: unlike animals or humans, ANNs don't have their own will, don't get bored or frustrated, and can focus on whatever you point them at. But in terms of core cognitive abilities - not surface knowledge, not impressive tricks, and certainly not LLM benchmarks - it is hard to say ANNs are smarter than a spider. (In fact they seem dumber than jumping spiders, which are able to form novel navigational plans in completely unfamiliar manmade environments. Even web-spinning spiders have no trouble spinning their webs in cluttered garages or pantries; would a transformer ANN be able to do that if it was trained on bushes and trees?)
His AI predictions remind me of Prof. Rodney Brooks (MIT, Roomba) and his similarly cautious timelines for AI development. Brooks has a very strong track record over decades of being pretty accurate with his timelines.
Our intuition isn't a good guide here. Intuitions are honed through repeated exposure and feedback, and we clearly don't have that in this domain.
Even though it doesn't feel dangerous, we can navigate this by reasoning through it. We understand that intelligence trumps brawn (e.g. Humans don't out-claw a tiger, we master it with intelligence). We understand that advances in AI have been very rapid, and that even though current AI doesnt feel dangerous, current AI turns into much more advanced future AI very quickly. And we understand that we dont really understand how these things work. We "control them safely" through mechanisms similar to how evolution controls us: theough the objective function. That shouldn't fill us with confidence because we find loopholes in evolution's objective function left and right like contraception, hyper-palatable foods, tiktok, etc.
All these lines of evidence converge on the conclusion that what we're building is dangerous to us.
They are the strongest statements that anyone can justifiably make about technology aiming to produce intelligence, since it is speculation about how what does not yet exist will do at achieving something that is ill-defined, and where what is clearly within that fuzzy definition is not well understood.
And it is a fortiori the strongest that can be said of things downstream of that, like dangers that are at least in part contingent on the degree of success in achieving "intelligence".
Since we're talking about the future, it would be incorrect to talk in absolutes so speaking in probabilities and priors is appropriate.
> Our intuition isn't a good guide here.
I'm not just using intuition. I've done as extensive an evaluation of the technology, trends, predictions and, most importantly, history as I'm personally willing to do on this topic. Your post is an excellent summary of basically the precautionary principle approach but, as I'm sure you know, the precautionary principle can be over-applied to justify almost any level of response to almost any conceivable risk. If the argument construes the risk as probably existential, then almost any degree of draconian response could be justified. Hence my caution when the precautionary principle is invoked to argue for disruptive levels of response (and to be clear, you didn't).
So the question really comes down to which scenarios at which level of probability and then what levels of response those bell-curve probabilities justify. Since I put 'foom-like' scenarios at low probability (sub-5%) and truly existential risk at sub-1%, I don't find extreme prevention measures justified due to their significant costs, burdens and disruptions.
At the same time, I'm not arguing we shouldn't pay close attention as the technology develops while expending some reasonable level of resources on researching ways to detect, manage and mitigate possible serious AI risks, if and when they materialize. In particular, I find the current proposed legislative responses to regulate a still-nascent emerging technology to be ill-advised. It's still far too early and at this point I find such proposals by (mostly) grandstanding politicians and bureaucrats more akin to crafting potions to ward off an unseen bogeyman. They're as likely to hurt as to help while imposing substantial costs and burdens either way. I see the current AI giants embracing such proposals as simply them seeing these laws as an opportunity to raise the drawbridge behind themselves since they have the size and funds to comply while new startups don't - and those startups may be the most likely source of whatever 'solutions' we actually need to the problems which have yet to make themselves evident.
A collective hallucination of what intelligence is is not a good basis for an argument about doom probabilities. We don't have a clue, yet loud people are pretending we do.
It is utterly horrifying that a contingent of Yudkowskyites essentially hijacked reasonable discourse around this subject. I grew up interacting with the LessWrong people: many of them have other problems interfacing with society that make it obvious to them they know what being "less wrong" looks like. The problem is we don't actually know any way to separate the human experience from "pure logic", whatever that actually means.
People are deluding themselves when thwy claim they "reason through this" (i.e. objectively). In other words: no one knows what's going to happen; people are just saying what they think.
What's the most plausible (even if you find it implausible) disaster scenario you came across in your research? It's a little surprising to see someone who has seriously looked into these ideas describe the bundle of them as "like Skynet."
It seems silly to me the idea of risk is all concentrated around the runaway intelligence scenario. While that might be possible there is real risk today in how we use these systems.
It is likely conditional on the price of compute dropping the way it has been.
If you can basically simulate a human brain on a $1000 machine, you don't really need to employ any AI researchers.
Of course, there has been some fear that the current models are a year away from FOOMing, but that does seem to be just the hype talking.
Based on the evidence I've seen to date, doing this part at the scale of human intelligence (regardless of cost) is highly unlikely to be possible for at least decades.
(a note to clarify: the goal "simulate a human brain" is substantially harder than other goals usually discussed around AI, like "exceed domain expert human ability on tests measuring problem solving in certain domain(s).)
Because you could easily find ways to print money e.g. curing types of cancers or inventing a better Ozempic.
But the fact is that there is no path to simulating a human brain.
It seems very obviously fundamentally solvable, though I agree it is nowhere in the near future.
I could see P=NP being impossible to prove but I find it hard to believe intelligence is impossible to figure out. Heck if you said it’d take us 100 years I would still think that’s a bit much.
Time travel. Teleportation through quantum entanglement. Intergalactic travel through wormholes.
And don't get me wrong they are hard. But just another class of problems. Right ?
Some have made this argument (quantum effects, external fields, etc.).
If any of these are proven to be true then we are looking at a completely different roadmap.
Can you please enlighten us then since you clearly know to what extent quantum effects exist in the brain.
It’s odd to say “reproduce quantum interactions” but remember to the extent they exist in the brain, they also behave as finicky/noisy quantum interactions. They’re not special brain quantum things.
Moving as smoothly as a cat and navigating the world is the part that actually took our brains millions of years to learn, and movement is effortless not because it's easy but because it took so long to master, so it's also going to be the most difficult thing to teach a machine.
The cognitive stuff is the dumb part, and that's why we have chess engines, pocket calculators and chatbots before we have emotional machines, artificial plumbers and robots that move like spiders.
Ten years ago, it was common to hear the argument: "Are cats intelligent? No, they can't speak." Language was seen as the pinnacle of the mind. Lately that's been flipped on its head, but only because machines have gotten so good at it.
I think the real reason we don't have robots that move like spiders is that robots don't have muscles, and motors are a very poor approximation.
If that was the real reason we'd have self driving cars deserving of that label before we had ChatGPT. In the world of atoms we still struggle with machines that are magnitudes less sophisticated than even a worm. No, it's because being embedded in the world is much more like being a cat and much less like a LLM or a chess computer.
Making something that's like a curious five year old who can't do a lot of impressive things and has no market value but who is probably closer to genuine intelligence is going to be much harder than making a search engine for the internet's latent space.
I'll grant you that LLMs are terrible at what I'd call "animal intelligence" - but I'm not so sure that animal intelligence is what is needed to, say, discover the laws of the universe. Solving mathematical problems is much more like playing chess than driving a car.
- both of them will spit regular kibble out in front of me when they want a fancier treat (cats are hilarious)
- the boy cat has developed very specific "sweet meows" (soft, high-pitched) for affection and "needy meows" (loud, full-chested) for toys or food; for the first few years he would simply amp up the volume and then give a frustrated growl when I did the wrong thing
- the lady cat (who only has two volumes, "yell" and "scream"), instead stands near what she wants before meowing; bedroom for cuddles, water bowl for treats, hallway or office for toys
- the lady cat was sick a while back and had painful poops; for weeks afterwards if she wanted attention and I was busy she would pretend to poop and pretend to be in pain, manipulating me into dropping my work and checking on her
It goes both ways, I've developed ways of communicating with them over the years:
- the lady is skittish but loves laying in bed with me, so I sing "gotta get up, little pup" in a particular way; she will then get up and give me space to leave the bed, without me scaring her with a sudden movement
- I don't lose my patience with them often, but they understand my anxious/exasperated tone of voice and don't push their luck too much (note that some of this is probably shared mammalian instinct)
- the boy sometimes bullies the lady, and I'll raise my voice at him; despite being otherwise skittish and scared of loud noises, the lady seems to understand that I am mad at the boy because of his actions and there's nothing to be alarmed by
Sometimes I think the focus on "context-free" (or at least context-lite) symbolic language, essentially unique to humans, makes us lose focus on the fact that communication is far older than the dinosaurs, and that maybe further progress on language AI should focus on communication itself, rather than symbol processing with communication as a side effect.
The whole comparison is stupid, and inexplicable at LeCun's level of play. AI is not a model of a human brain, or a cat brain, or a parrot brain, or any other kind of brain. It's something else, something that did not exist in any form just a few years ago.
We must at the very least resist trying to compare human with artificial intelligence on general, dimensional measures. It does not make any sense, because the nature of the two are more different than alike.
He leads AI at Meta, a company with the competitive strategy to commoditize AI via Open Source models. Their biggest hinderance would be regulation putting a stop to the proliferation of capabilities. So they have to understate the power of the models. This is the only way Meta can continue sucking steam out of the leading labs.
Its similar to how the WSJ journalist would never ask him what he thinks about the larger effects of the "deindustrialization" of knowledge-based jobs caused by AI. Not because the journalist is malicious, its just the shared, subconscious ideology.
People don't need a reason to protect capital interests, even poor people on the very bottom will protect it.
* pretty sure any revenue from commercial Llama licenses are a rounding error at best
Sounds like we should be fully supporting them then.
AI can’t push a houseplant off a shelf, so there’s that.
Talking about intelligence as a completely disembodied concept seems meaningless. What does ”cat” even mean if comparing to something that doesn’t have a physical corporeal presence in time and space. To compare like this seems to me like making a fundamental category error.
edit: Quoting, “You’re going to have to pardon my French, but that’s complete B.S.”
I guess I’m just agreeing with LeCun here.
I don't understand this criticism at all. If I go over to ChatGPT and say "From the perspective of a cat, create a multistage plan to push a houseplant off a shelf" it will satisfy my request perfectly.
But guessing at what you mean - when I evaluate ChatGPT, I include all the trivial add-ons. For example, AutoGPT will create a plan like this and then execute the plan one step at a time.
I think it would be silly to evaluate ChatGPT solely as a single execution endpoint.
The model does not "plan" anything, it has no idea how a sentence will end when it starts it as it only considers what word comes next- then what word after that- then what word after that. It discovers the sentence is over when the next token turns out to be a period. It discovers it's finished it's assignment when the next token turns out to be a stop token.
So one could say the model provides the illusion of planning, but is never really planning anything other than what the next word to write is.
Ok. Suppose I create the illusion of a calculator. I type in 5, then plus, then 5. And it gives me the illusional answer of 10.
What's the difference?
You can't say "who cares if it's an illusion it works for me" when the topic is whether an attempt to build a better one will work for the stated goal.
Otherwise, I think we should go our separate ways. You take care now.
If you don't care about the technical aspects, why ask in the first place what Yann LeCun meant?
You take care now.
This argument we're having is a version of the Chinese Room. I've never found Searle's argument persuasive, and I truly have no interest arguing it with you.
This is the last time I will respond to you. I hope you have a nice day.
There's a lot of confusion about these technologies, because tech enthusiasts like to exaggerate the state of the art's capabilities. You seem to be arguing "we must turn to philosophy to show CHATGPT is smarter than it would seem" which is not terribly convincing.
Take care now.
If there's a representative phrase from the article itself that's neutral enough, we could use that instead.