DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
The International Conference on Learning Representations (ICLR) 2021 [Spotlight]
Aayam Shrestha Stefan Lee Prasad Tadepalli Alan Fern
Oregon State University
|
Abstract
We study an approach to offline reinforcement learning (RL) based on optimally
solving finitely-represented MDPs derived from a static dataset of experience.
This approach can be applied on top of any learned representation and has the
potential to easily support multiple solution objectives as well as zero-shot adjust-
ment to changing environments and goals. Our main contribution is to introduce
the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions
for offline RL. DAC-MDPs are a non-parametric model that can leverage deep
representations and account for limited data by introducing costs for exploiting
under-represented parts of the model. In theory, we show conditions that allow
for lower-bounding the performance of DAC-MDP solutions. We also investigate
the empirical behavior in a number of environments, including those with image-
based observations. Overall, the experiments demonstrate that the framework can
work in practice and scale to large complex offline RL problems.
|
Paper: [PDF]
Code: [GitHub]
Poster: [ICLR]
Preprint: [arXiv]
|
Videos:
Bibtex
@article{
Shrestha2020DeepAveragersOR,
title={DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs},
author={Aayam Shrestha and Stefan Lee and Prasad Tadepalli and Alan Fern},
journal={ICLR},
year={2021},
numpages = {24},
url = {https://iclr.cc/virtual/2021/poster/3092}
keywords = {non-parametric markov decision process, exact planning, offline reinforcement learning}
}