Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs

Luuk Bom, Ruud Henken, Marco Wiering

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

21 Citations (Scopus)
742 Downloads (Pure)


Reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic environments. In this paper we will study the use of reinforcement learning on the popular arcade video game Ms. Pac-Man. In order to let Ms. Pac-Man quickly learn, we designed particular smart feature extraction algorithms that produce higher-order inputs from the game-state. These inputs are then given to a neural network that is trained using Q-learning. We constructed higher-order features which are relative to the action of Ms. Pac-Man. These relative inputs are then given to a single neural network which sequentially propagates the action-relative inputs to obtain the
different Q-values of different actions. The experimental results
show that this approach allows the use of only 7 input units in the
neural network, while still quickly obtaining very good playing
behavior. Furthermore, the experiments show that our approach
enables Ms. Pac-Man to successfully transfer its learned policy
to a different maze on which it was not trained before.
Original languageEnglish
Title of host publicationProceedings of IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning
Subtitle of host publicationADPRL
Number of pages8
Publication statusPublished - 2013


  • Reinforcement learning
  • Machine learning

Cite this