1. R. Ahuja, T. Magnanti, and J. Orlin. Network flows: Theory, algorithms, and applications. Prentice Hall, 1993.
2. J. Ba and R. Caruana. Do deep nets really need to be deep? NIPS Conference, pp. 2654–2662, 2014.
3. J. Ba, J. Kiros, and G. Hinton. Layer normalization. arXiv:1607.06450, 2016. https://arxiv.org/abs/1607.06450
4. M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear programming: theory and algorithms. John Wiley and Sons, 2013.
5. S. Becker, and Y. LeCun. Improving the convergence of back-propagation learning with second order methods. Proceedings of the 1988 connectionist models summer school, pp. 29–37, 1988.