Abstract
Proximal Policy Optimization (PPO), as an outstanding Reinforcement learning (RL) algorithm, has proven its efficiency when solving a wide range of problems. Compared to other reinforcement learning algorithms, it has the advantage of advanced stability and reliability. However, as an on-policy algorithm, it suffers from the problem of sample inefficiency and moderate training speed. In this paper, we utilize two methods, namely, share parameter and share trajectory to speed up the training process of the PPO algorithm. Moreover, we introduce a method that uses the adaptive blending concept to prevent unnecessary updates during the parameter-sharing process. We also introduce the technique of possibility for selection, along with the thresholding method to balance the exploitation and exploration when incorporating the trajectory-sharing method. Tests performed under a multi-agent environment setup show both methods converge significantly faster in comparison to the training process of the traditional PPO algorithm.
Original language | English |
---|---|
Title of host publication | 2024 IEEE International Conference on Development and Learning, ICDL 2024 |
Publisher | IEEE |
Number of pages | 6 |
ISBN (Electronic) | 9798350348552 |
DOIs | |
Publication status | Published - 27-Aug-2024 |
Event | 2024 IEEE International Conference on Development and Learning, ICDL 2024 - Austin, United States Duration: 20-May-2024 → 23-May-2024 |
Conference
Conference | 2024 IEEE International Conference on Development and Learning, ICDL 2024 |
---|---|
Country/Territory | United States |
City | Austin |
Period | 20/05/2024 → 23/05/2024 |