1. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks;Chaudhari,2017
2. Stochastic gradient descent as approximate bayesian inference;Mandt;J. Mach. Learn. Res.,2017
3. Approximation by superpositions of a sigmoidal function;Cybenko;Math. Control Signals Systems,1989
4. A stochastic approximation method;Robbins;Ann. Math. Stat.,1951
5. A theory of adaptive pattern classifiers;Amari;IEEE Trans. Electron. Comput.,1967