In it, he stated the following:
> Indeed, the famous “backpropagation” algorithm that was rediscovered by David Rumelhart in the early 1980s, and which is now viewed as being at the core of the so-called “AI revolution,” first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon.
I was wondering whether anyone could point me to the paper or piece of work he was referring to. There are many citations in Schmidhuber’s piece, and in my previous attempts I've gotten lost in papers.
The Minimum-Time Thrust-Vector Control Law in the Apollo Lunar-Module Autopilot (1970)
https://www.sciencedirect.com/science/article/pii/S147466701...
Henry J. Kelley (1960). Gradient Theory of Optimal Flight Paths.
[1] https://claude.ai/public/artifacts/8e1dfe2b-69b0-4f2c-88f5-0...
I am still going through it, but the latter is quite interesting!
I think "its" refers to control theory, not backpropagation.
- Henry J. Kelley (1960), “Gradient Theory of Optimal Flight Paths,” ARS Journal.
- A.E. Bryson & W.F. Denham (1962), “A Steepest-Ascent Method for Solving Optimum Programming Problems,” Journal of Applied Mechanics.
- B.G. Junkin (1971), “Application of the Steepest-Ascent Method to an Apollo Three-Dimensional Reentry Optimization Problem,” NASA/MSFC report.
I pasted the output so a ton of people wouldn't repeat the same question to ChatGPT and burn a ton of CO2 to get the same answer.
I didn't paste the query since I didn't find it interesting.
And I didn't fact check because I didn't have the time. I was walking and had a few seconds to just do this on my phone.
Not sure how this was rude, I certainly didn't intend it to be...
It's a weird thing to wonder after so many people expressed their dislike of the upthread low-effort comment with a down vote (and then another voiced a more explicit opinion). The point is that a reader may want to know that the text they're reading is something a human took the time to write themselves. That fact is what makes it valuable.
> pncnmnp seems happy
They just haven't commented. There is no reason to attribute this specific motive to that fact.
The reader may also simply want information that helps them.
> They just haven't commented.
Yes, they did.
Directly posting the random text generated by the LLM is more annoying. I mean, they didn’t even vouch or it or verify that it was right.
Also, I quite love it when people clearly demarcate which part of their content came from an LLM, and specifies which model.
The little citation carries a huge amount of useful information.
The folks who don't like AI should like it too, as they can easily filter the content.
[a] https://www.nobelprize.org/uploads/2024/11/advanced-physicsp...
In a recent talk he made a quip that he had to change some slides because if you have a Nobel prize in physics you should at least get the units right.
Now perhaps Hinton does deserve the award, but certainly it should not be because of the reasons you cite: money and popularity.
You refuted an argument about being honest about accepting an award on the basis that the award pays a lot of money and grants one a great deal of popularity.
If your argument didn't involve money and popularity, then why did you choose those two specific criteria as the justification for accepting this award?
I want to be clear, I am not claiming that Dr. Hinton accepted the award in a dishonest manner or that he did it for money, I am simply refuting your position that money is a valid reason to disregard honesty for accepting a prestigious award.
Some things never change.
[1]: https://www.amazon.com/Talking-Nets-History-Neural-Networks/...
Once people had a sufficiently compelling reason to write differentiable code, the frameworks around differentiable programming (theano, tensorflow, torch, JAX) picked up a lot of steam.
https://en.wikipedia.org/wiki/Adaptive_filter
doesn't need a differentiation of the forward term, but if you squint it looks pretty close
Maybe they are. I'm not here to do a deep research project that involves reading every citation in that article. If it makes you feel better, pretend that what I said was instead:
"I don't have all the relevant citations stored in my short-term memory right this second and I am not interested in writing a lengthy thesis to satisfy pedantic navel-gazers on HN."
Or, if you really know some reinvention of backprop that is not mentioned here,
WTF are you on about? I never made any such claim, or anything remotely close to it.
I don't really understand your negativity here, and what you are reading into my comment. I never asked you to do a research project? I just thought you might know some other references which are not in the article. If you don't, fine.
Note that I don't expect that any relevant reference is missing here. Schmidhuber always try to be very careful to be very complete and exhaustive cite everything there is on some topic. That is why I was double curious about the possibility that sth is missing, and what it could be.
Nah, I wasn't trying to imply that that book had anything more than the article, at least in regards to the back-prob question specifically. Just pointing it out as one more good resource for this kind of historical perspective.
I don't really understand your negativity here, and what you are reading into my comment. I never asked you to do a research project? I just thought you might know some other references which are not in the article. If you don't, fine.
No worries. I may be reacting more to a general HN meme than to you in particular. There's a certain brand of pedantry and obsessive nit-picking that is all too common here IMO. It grates on my nerves, so if I ever seem a little salty, it's probably because I thought somebody was doing that thing. It's all good. My apologies for the argumentative tone earlier.
Schmidhuber always try to be very careful to be very complete and exhaustive cite everything there is on some topic.
Agreed. That's one reason I don't get why people are always busting on Jurgen. For the most part, it seems that he can back up the claims he makes, and then some. I've heard plenty of people complain about him, but I'm not sure any of them have ever been able to state any particular sense in which he is actually wrong about anything. :-)
So... Automatic integration?
Proportional, integrative, derivative. A PID loop sure sounds like what they're talking about.
It has a lot more overhead than regular forwards mode autodiff because you need to cache values from running the function and refer back to them in reverse order, but the advantage is that for function with many many inputs and very few outputs (i.e. the classic example is calculating the gradient of a scalar function in a high dimensional space like for gradient descent), it is algorithmically more efficient and requires only one pass through the primal function.
On the other hand, traditional forwards mode derivatives are most efficient for functions with very few inputs, but many outputs. It's essentially a duality relationship.
For vector valued functions, the naive way you would learn in a vector calculus class corresponds to forward mode AD.
e.g. optimization of state space control coefficients looks something like training a LLM matrix...
However, from what I have seen, this isn't really a useful way of reframing the problem. The optimal control problem is at least as hard, if not harder, than the original problem of training the neural network, and the latter has mature and performant software for doing it efficiently. That's not to say there isn't good software for optimal control, but it's a more general problem and therefore off-the-shelf solvers can't leverage the network structure very well.
Some researchers have made interesting theoretical connections like in neural ODEs, but even there the practicality is limited.
As the name implies, the calculation is done forward.
Reverse mode automatic differentiation starts from the root of the symbolic expression and calculates the derivative for each subexpression simultaneously.
The difference between the two is like the difference between calculating the Fibonacci sequence recursively without memoization and calculating it iteratively. You avoid doing redundant work over and over again.
Neither are really inventions, they are discoveries, if anything the chain rule leans slightly more to invention than backdrop.
I understand the need for attribution as a means to track the means and validity of discovery, but I intensely dislike it when people act like it is a deed of ownership of an idea.
[RUM] DE Rumelhart, GE Hinton, RJ Williams (1985). Learning Internal Representations by Error Propagation.
[HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record.
I remember when I learnt about artificial neural networks at university in the late 00s my professors were really sceptical of them, rightly explaining that they become hard to train as you added more hidden layers.
See, what makes backpropagation and artificial neural networks work are all of the small optimisations and algorithm improvements that were added on top of backpropagation. Without these improvements it's too computationally inefficient to be practical and you have to contend with issues like exploding gradients.
I think Geoffrey Hinton has noted a few times that for people like him who have been working on artificial neural networks for years it's quite surprising that today neural networks just work because for years it was so hard to get them to do anything. In this sense while backpropagation is the foundational algorithm, it's not sufficient on it's own. It was the many improvements that were made on top of backpropagation that actually make artificial neural networks work and take off in the 2010s when some of the core components of modern neural networks started to fall into place.
I remember when I first learnt about neural networks I thought maybe coupling them with some kind of evolutionary approach might be what was needed to make them work. I had absolutely no idea what I was doing of course, but I spent so many nights experimenting with neural networks. I just loved the idea of an artificial "neural network" being able to learn a new problem and spit out an answer. The biggest regret of my life was coming out of university and going into web development because there were basically no AI jobs back then, and no such thing as an AI startup. If you wanted to do AI back then you basically had to be a researcher which didn't interest me at the time.
Some ask: "Isn't backpropagation just the chain rule of Leibniz (1676) [LEI07-10] & L'Hopital (1696)?" No, it is the efficient way of applying the chain rule to big networks with differentiable nodes—see Sec. XII of [T22][DLH]). (There are also many inefficient ways of doing this.) It was not published until 1970 [BP1].