Improving Proximal Policy Optimization Algorithm in Interactive Multi-Agent Systems

Yi Shang, Yifei Chen, Francisco Cruz

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Proximal Policy Optimization (PPO), as an outstanding Reinforcement learning (RL) algorithm, has proven its efficiency when solving a wide range of problems. Compared to other reinforcement learning algorithms, it has the advantage of advanced stability and reliability. However, as an on-policy algorithm, it suffers from the problem of sample inefficiency and moderate training speed. In this paper, we utilize two methods, namely, share parameter and share trajectory to speed up the training process of the PPO algorithm. Moreover, we introduce a method that uses the adaptive blending concept to prevent unnecessary updates during the parameter-sharing process. We also introduce the technique of possibility for selection, along with the thresholding method to balance the exploitation and exploration when incorporating the trajectory-sharing method. Tests performed under a multi-agent environment setup show both methods converge significantly faster in comparison to the training process of the traditional PPO algorithm.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Development and Learning, ICDL 2024
PublisherIEEE
Number of pages6
ISBN (Electronic)9798350348552
DOIs
Publication statusPublished - 27-Aug-2024
Event2024 IEEE International Conference on Development and Learning, ICDL 2024 - Austin, United States
Duration: 20-May-202423-May-2024

Conference

Conference2024 IEEE International Conference on Development and Learning, ICDL 2024
Country/TerritoryUnited States
CityAustin
Period20/05/202423/05/2024

Fingerprint

Dive into the research topics of 'Improving Proximal Policy Optimization Algorithm in Interactive Multi-Agent Systems'. Together they form a unique fingerprint.

Cite this