The QV Family Compared to Other Reinforcement Learning Algorithms

Marco A. Wiering*, Hado van Hasselt

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

20 Citations (Scopus)

Abstract

This paper describes several new online model-free reinforcement learning (RL) algorithms. We designed three new reinforcement algorithms, namely: QV2, QVMAX, and QV-MAX2, that are all based on the QV-learning algorithm, but in contrary to QV-learning, QVMAX and QVMAX2 are off-policy RL algorithms and QV2 is a new on-policy RL algorithm. We experimentally compare these algorithms to a large number of different RL algorithms, namely: Q-learning, Sarsa, R-learning, Actor-Critic, QV-learning, and ACLA. We show experiments on five maze problems of varying complexity. Furthermore, we show experimental results on the cart pole balancing problem. The results show that for different problems, there can be large performance differences between the different algorithms, and that there is not a single RL algorithm that always performs best, although on average QV-learning scores highest.

Original languageEnglish
Title of host publicationADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING
Place of PublicationNEW YORK
PublisherIEEE (The Institute of Electrical and Electronics Engineers)
Pages101-108
Number of pages8
ISBN (Print)978-1-4244-2761-1
Publication statusPublished - 2009
EventIEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning - , Tunisia
Duration: 30-Mar-20092-Apr-2009

Other

OtherIEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
Country/TerritoryTunisia
Period30/03/200902/04/2009

Keywords

  • APPROXIMATION
  • CONVERGENCE

Fingerprint

Dive into the research topics of 'The QV Family Compared to Other Reinforcement Learning Algorithms'. Together they form a unique fingerprint.

Cite this