1. Barto, A. G.; Sutton, R. S.; and Watkins, C. J. C. H. 1989. Learning and sequential decision making. Technical Report 89–95, Department of Computer and Information Science, University of Massachusetts, Amherst, Massachusetts. Also published in Learning and Computational Neuroscience: Foundations of Adaptive Networks, Michael Gabriel and John Moore, editors. The MIT Press, Cambridge, Massachusetts, 1991.
2. Dynamic Programming: Deterministic and Stochastic Models;Bertsekas,1987
3. Boyan, Justin A. 1992. Modular neural networks for learning context-dependent game strategies. Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, England.
4. Heger, Matthias 1994. Consideration of risk in reinforcement learning. In Proceedings of the Machine Learning Conference. To appear.
5. Dynamic Programming and Markov Processes;Howard,1960