The systems of the 1960s were already efficient in the DP sense. A simplified derivation of this backpropagation method uses the chain rule only (Dreyfus, 1962). Steepest descent in the weight space of such systems can be performed (Bryson, 1961 Kelley, 1960 Bryson and Ho, 1969) by iterating the chain rule (Leibniz, 1676 L'Hopital, 1696) à la Dynamic Programming (DP, Bellman, 1957). The minimisation of errors through gradient descent (Cauchy 1847, Hadamard, 1908) in the parameter space of complex, nonlinear, differentiable, multi-stage, NN-related systems has been discussed at least since the early 1960s (e.g., Kelley, 1960 Bryson, 1961 Bryson and Denham, 1961 Pontryagin et al., 1961 Dreyfus, 1962 Wilkinson, 1965 Amari, 1967 Bryson and Ho, 1969 Director and Rohrer, 1969), initially within the framework of Euler-LaGrange equations in the Calculus of Variations (e.g., Euler, 1744). I had a look at the original papers from the 1960s and 70s, and talked to BP pioneers. ![]() It is easy to find misleading accounts of BP's history (as of July 2014). Its modern version (also called the reverse mode of automatic differentiation) was first published in 1970īy Finnish master student Seppo Linnainmaa.Ĭoncepts of BP were known even earlier though. ![]() Neural Network (NN) ReNNaissance and "Deep Learning." Efficient backpropagation (BP) is central to the ongoing
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |