1. [21] Al-Shedivat, M. , Bansal, T. , Burda, Y. , Sutskever, I. , Mordatch, I. and Abbeel, P. , “Continuous adaptation via meta-learning in nonstationary and competitive environments,” arXiv preprint, arXiv:1710.03641 (2017).
2. [16] Schulman, J. , Levine, S. , Abbeel, P. , Jordan, M. and Moritz, P. , “Trust Region Policy Optimization,” In: International Conference on Machine Learning (2015) pp. 1889–1897.
3. ZERO-MOMENT POINT — THIRTY FIVE YEARS OF ITS LIFE
4. [9] Li, Z. , Cheng, X. , Peng, X. B. , Abbeel, P. , Levine, S. , Berseth, G. and Sreenath, K. , “Reinforcement learning for robust parameterized locomotion control of bipedal robots,” CoRR abs/2103.14295 (2021). arXiv:2103.14295. https://arxiv.org/abs/2103.14295
5. [32] Pinto, L. , Davidson, J. , Sukthankar, R. and Gupta, A. , “Robust Adversarial Reinforcement Learning,” In: Proceedings of the 34th International Conference on Machine Learning, Volume 70 (JMLR.org, 2017) pp. 2817–2826.