1. Self-supervised exploration via temporal inconsistency in reinforcement learning;gao;ArXiv Preprint,2022
2. Gradient surgery for multi-task learning;yu;Advances in neural information processing systems,2020
3. Actor-critic algorithms;konda;Advances in neural information processing systems,1999
4. Deep exploration va bootstrapped dqn;osband;Advances in neural information processing systems,2016
5. Calculation of the Wasserstein Distance Between Probability Distributions on the Line