OPEN: Learned Optimization for RL in JAX

This is the official implementation of OPEN from Can Learned Optimization Make Reinforcement Less Difficult, NeurIPS 2024 (Spotlight) and the AutoRL Workshop @ ICML 2024 (Spotlight).

OPEN is a framework for learning to optimize (L2O) in reinforcement learning. Here, we provide full JAX code to replicate the experiments in our paper and foster future work in this direction. Our current codebase can be used with environments from gymnax or Brax.

🖥️ Usage

All files for running OPEN are stored in <rl_optimizer/>.

🏋️‍♀️ Training

Alongside training code in rl_optimizer/train.py, we include configs for [freeway, asterix, breakout, spaceinvaders, ant, gridworld]. We automate parallelisation over multiple GPUs using JAX sharding. The flag <--larger> can be used to increase the size of the network in OPEN. To learn an optimizer in one or a combination of these environments run:

python3 train.py --envs <env> --num-rollouts <num_rollouts> --popsize <popsize> --noise-level <sigma_init> --sigma-decay <sigma_decay> --lr <lr> --lr-decay <lr-decay> --num-generations <num_gens> --save-every-k <evaluation_frequency> --wandb-name "<wandb name>" --wandb-entity "<wandb entity>" [--larger]

This will save a checkpoint, and evaluate the performance of the optimizer, every $k$ steps. Please note that gridworld can not be run in tandem with other environments as it is the only environment which we apply antithetic task sampling to.

We include our hyperparameters in the paper. An example usage is:

python3 train.py --envs breakout --num-rollouts 1 --popsize 64 --noise-level 0.03 --sigma-decay 0.999 --lr 0.03 --lr-decay 0.999 --num-generations 500 --save-every-k 24 --wandb-name "OPEN Breakout"

🔬 Evaluation

To evaluate the performance of learned optimizers, run the following command by providing the relevant wandb run IDs to <--exp-name> and the generation number to --exp-num. This code is run intermittently during training too.

For experimental purposes, we provide learned weights for the trained optimizers from our paper for the aforementioned environments in rl_optimizer/pretrained. These can be used with the argument <--pretrained> in place of wandb IDs. Use the <--larger> flag if this was used in training, and to experiment with our pretrained <multi> optimizers pass the <--multi> flag.

python3 rl_optimizer.eval --envs <env-names> --exp-name <wandb experiment IDs> --exp-num <generation numbers>  --num-runs 16 --title <foldername for saving files> [--pretrained --multi --larger]

⬇️ Installation

We include submodules for Learned Optimization and GROOVE. Therefore, when cloning this repo, ensure to use --recurse-submodules:

git clone --recurse-submodules git@github.com:AlexGoldie/rl-learned-optimization.git

📝 Requirements

We include requirements in setup/requirements.txt. Dependencies can be install locally using:

pip install -r setup/requirements.txt

🐋 Docker

We also provide files to help build a Docker image. Since we use wandb for logging checkpoints, you should supply this as an argument to build_docker.sh.

cd setup
chmod +x build_docker.sh
./build_docker.sh {WANDB_API_KEY}
cd ..
chmod +x run_docker.sh
./run_docker.sh {GPU_NAMES}

For example, starting the docker container with access to GPUs 0 and 1 can be done as ./run_docker.sh 0,1

📚 Related Work

The following projects were used extensively in the making of OPEN:

🎓 Learned Optimization
🦎 Evosax
⚡ PureJaxRL
🕺 GROOVE
🐜 Brax
💪 Gymnax

🔖 Citation

If you use OPEN in your work, please cite the following:

@inproceedings{goldie2024can,
    author={Alexander D. Goldie and Chris Lu and Matthew Thomas Jackson and Shimon Whiteson and Jakob Nicolaus Foerster},
    booktitle={Advances in Neural Information Processing Systems},
    title={Can Learned Optimization Make Reinforcement Learning Less Difficult?},
    year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
rl_optimizer		rl_optimizer
setup		setup
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
run_docker.sh		run_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OPEN: Learned Optimization for RL in JAX

🖥️ Usage

🏋️‍♀️ Training

🔬 Evaluation

⬇️ Installation

📝 Requirements

🐋 Docker

📚 Related Work

🔖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AlexGoldie/rl-learned-optimization

Folders and files

Latest commit

History

Repository files navigation

OPEN: Learned Optimization for RL in JAX

🖥️ Usage

🏋️‍♀️ Training

🔬 Evaluation

⬇️ Installation

📝 Requirements

🐋 Docker

📚 Related Work

🔖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages