Abstract
AbstractOne of the long-term goals of reinforcement learning is to build intelligent agents capable of rapidly learning and flexibly transferring skills, similar to humans and animals. In this paper, we introduce an episodic control framework based on the temporal expansion of subsequent features to achieve these goals, which we refer to as Temporally Extended Successor Feature Neural Episodic Control (TESFNEC). This method has shown impressive results in significantly improving sample efficiency and elegantly reusing previously learned strategies. Crucially, this model enhances agent training by incorporating episodic memory, significantly reducing the number of iterations required to learn the optimal policy. Furthermore, we adopt the temporal expansion of successor features a technique to capture the expected state transition dynamics of actions. This form of temporal abstraction does not entail learning a top-down hierarchy of task structures but focuses on the bottom-up combination of actions and action repetitions. Thus, our approach directly considers the temporal scope of sequences of temporally extended actions without requiring predefined or domain-specific options. Experimental results in the two-dimensional object collection environment demonstrate that the method proposed in this paper optimizes learning policies faster than baseline reinforcement learning approaches, leading to higher average returns.
Funder
Research Foundation for Advanced Talents of Henan University of Technology
Key Scientific Research Projects of Higher Education Institutions in Henan Province
Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education
Publisher
Springer Science and Business Media LLC
Reference70 articles.
1. Littman, M. L. Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553), 445 (2015).
2. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).
3. Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: Statistics, structure, and abstraction. Science 331(6022), 1279–1285 (2011).
4. Metelli, A. M., Mazzolini, F., Bisi, L., Sabbioni, L. & Restelli, M. Control frequency adaptation via action persistence in batch reinforcement learning. In International Conference on Machine Learning, PMLR 6862–6873 (2020).
5. Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016).