A Tractable Algorithm for Finite-Horizon Continuous Reinforcement Learning

Gampa, P.; Kondamudi, S.S.; Kailasam, L.

A Tractable Algorithm for Finite-Horizon Continuous Reinforcement Learning

Files

08782442.pdf (1.35 MB)

Date

2019-02

Authors

Gampa, P.

Kondamudi, S.S.

Kailasam, L.

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order Ω(T2/3) for any algorithm discretizes the state space, improving the previous regret bound of Ω(T1/2) of Ortner and Ryabko [1] for the same problem. Next,under the assumption that the rewards and transitions are Hölder Continuous we show that the upper bound on the discretization error is const.Ln-α T. Finally, we give some simple experiments to validate our propositions. © 2019 IEEE.