1. Reinforcement learning: An introduction;Sutton,1998
2. Off-policy temporal-difference learning with function approximation;Precup,2001
3. Learning from delayed rewards, (Ph.D. thesis);Watkins,1989
4. Off-policy learning with eligibility traces: a survey.;Geist;J. Mach. Learn. Res.,2014
5. Convergence of least squares temporal difference methods under general conditions;Yu,2010