TY - GEN
T1 - Decoupling State Representation Methods from Reinforcement Learning in Car Racing
AU - Montoya, Juan M.
AU - Daunhawer, Imant
AU - Vogt, Julia E.
AU - Wiering, Marco
N1 - Funding Information:
Thanks deeply to Vassilios Tsounis and Katia Bouder. ID is supported by the SNSF grant #200021 188466.
Publisher Copyright:
© 2021 by SCITEPRESS - Science and Technology Publications, Lda.
PY - 2021
Y1 - 2021
N2 - In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI's car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; however, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations has several benefits for solving RL tasks.
AB - In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI's car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; however, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations has several benefits for solving RL tasks.
KW - Constrastive learning
KW - Deep reinforcement learning
KW - State representation learning
KW - Variational autoencoders
UR - https://www.scopus.com/pages/publications/85103819291
U2 - 10.5220/0010237507520759
DO - 10.5220/0010237507520759
M3 - Conference contribution
AN - SCOPUS:85103819291
T3 - ICAART 2021 - Proceedings of the 13th International Conference on Agents and Artificial Intelligence
SP - 752
EP - 759
BT - ICAART 2021 - Proceedings of the 13th International Conference on Agents and Artificial Intelligence
A2 - Rocha, Ana Paula
A2 - Steels, Luc
A2 - van den Herik, Jaap
PB - SciTePress
T2 - 13th International Conference on Agents and Artificial Intelligence, ICAART 2021
Y2 - 4 February 2021 through 6 February 2021
ER -