A Theoretical and Empirical Analysis of Expected Sarsa

Harm van Seijen, Hado van Hasselt, Shimon Whiteson, Marco Wiering

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    145 Citations (Scopus)

    Abstract

    This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic on policy
    temporal-difference method for model-free reinforcement
    learning. Expected Sarsa exploits knowledge about stochasticity
    in the behavior policy to perform updates with lower variance.
    Doing so allows for higher learning rates and thus faster learning.
    In deterministic environments, Expected Sarsa’s updates
    have zero variance, enabling a learning rate of 1. We prove
    that Expected Sarsa converges under the same conditions as
    Sarsa and formulate specific hypotheses about when Expected
    Sarsa will outperform Sarsa and Q-learning. Experiments in
    multiple domains confirm these hypotheses and demonstrate
    that Expected Sarsa has significant advantages over these more
    commonly used methods.
    Original languageEnglish
    Title of host publicationProceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
    Subtitle of host publicationADPRL
    Publication statusPublished - 2009

    Keywords

    • Reinforcement learning

    Fingerprint

    Dive into the research topics of 'A Theoretical and Empirical Analysis of Expected Sarsa'. Together they form a unique fingerprint.

    Cite this