A Tractable Algorithm for Finite-Horizon Continuous Reinforcement Learning
| dc.contributor.author | Gampa, P. | |
| dc.contributor.author | Kondamudi, S.S. | |
| dc.contributor.author | Kailasam, L. | |
| dc.date.accessioned | 2021-01-05T05:22:41Z | |
| dc.date.available | 2021-01-05T05:22:41Z | |
| dc.date.issued | 2019-02 | |
| dc.description.abstract | We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order Ω(T2/3) for any algorithm discretizes the state space, improving the previous regret bound of Ω(T1/2) of Ortner and Ryabko [1] for the same problem. Next,under the assumption that the rewards and transitions are Hölder Continuous we show that the upper bound on the discretization error is const.Ln-α T. Finally, we give some simple experiments to validate our propositions. © 2019 IEEE. | en_US |
| dc.identifier.issn | 978-172812662-3 | |
| dc.identifier.uri | https://idr-sdlib.iitbhu.ac.in/handle/123456789/1234 | |
| dc.language.iso | en_US | en_US |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
| dc.relation.ispartofseries | Proceedings - 2019 2nd International Conference on Intelligent Autonomous Systems, ICoIAS 2019; | |
| dc.subject | Reinforcement Learning | en_US |
| dc.subject | Markov Decision Process(MDP) | en_US |
| dc.subject | Regret | en_US |
| dc.subject | Continuous State Space | en_US |
| dc.subject | Bonus | en_US |
| dc.subject | Finite Horizon | en_US |
| dc.title | A Tractable Algorithm for Finite-Horizon Continuous Reinforcement Learning | en_US |
| dc.type | Article | en_US |