CS234: Reinforcement Learning Winter 2025
106 points
9 hours ago
| 3 comments
| web.stanford.edu
| HN
sillysaurusx
6 hours ago
[-]
It’s been said that RL is the worst way to train a model, except for all the others. Many prominent scientists seem to doubt that this is how we’ll be training cutting edge models in a decade. I agree, and I encourage you to try to think of alternative paradigms as you go through this course.

If that seems unlikely, remember that image generation didn’t take off till diffusion models, and GPTs didn’t take off till RLHF. If you’ve been around long enough it’ll seem obvious that this isn’t the final step. The challenge for you is, find the one that’s better.

reply
whatshisface
6 hours ago
[-]
RL is barely even a training method, its more of a dataset generation method.
reply
theOGognf
5 hours ago
[-]
I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs
reply
phyalow
40 seconds ago
[-]
Its reductive, but also roughly correct.
reply
mistercheph
2 hours ago
[-]
care to correct the misunderstanding?
reply
paswut
4 hours ago
[-]
What about for combinatorial optimization? When you have a simulation of the world what other paradigms are fitting
reply
whatever1
1 hour ago
[-]
More likely we will develop general super intelligent AI before we (together with our super intelligent friends) solve the problem of combinatorial optimization.
reply
charcircuit
3 hours ago
[-]
GPT wouldn't have even been possible, let alone take off, without self supervised learning.
reply
kgarten
6 hours ago
[-]
Are the videos available somewhere?

spring course is on YouTube https://m.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpT...

reply
zerosizedweasle
6 hours ago
[-]
Given Ilya's podcast this is an interesting title.
reply