Skip to content

Code for "Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity" (NeurIPS 2024)

License

Notifications You must be signed in to change notification settings

robbycostales/diva

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

The official repository for the NeurIPS 2024 paper Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity (Costales & Nikolaidis). If you find the code helpful, please cite the corresponding paper:

@inproceedings{
    costales2024enabling,
    title={Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity},
    author={Robby Costales and Stefanos Nikolaidis},
    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
    year={2024},
    url={https://openreview.net/forum?id=Xo1Yqyw7Yx}
}

Contents

Code structure

All algorithmic code is in the diva/ directory. Below is the structure of the most notable files and directories.

  • main.pymain entry point for running the code
  • metalearner.py — meta-RL training loop
  • cfg/configuration files for domains and algorithms
  • components/ — main algorithmic components
    • components/level_replay/ — level replay code (for ACCEL, PLR)
    • components/qd/QD code (for DIVA's archive)
      • components/qd/qd_module.py — code most relevant to DIVA
    • components/policy/ — code relevant to the RL policy
    • components/vae/ — VAE code (for VariBAD's encoder)
  • environments/ — environments and environment-specific code
    • environments/alchemy/Alchemy environment and related code
    • environments/box2d/Racing environment and related code
    • environments/toygrid/GridNav environment and related code
  • utils/ — miscellaneous helper code

Setup

These setup instructions asume the user is running Ubuntu 20.04 and has CUDA 12. We use Anaconda to manage the environment. Ensure that you also have CUDA toolkit installed and the latest version of Anaconda. Additionally, we use virtual displays for certain environments, for which you will need to install Xvfb (via e.g. sudo apt install xvfb).

Terminal commands

git clone git@github.com:robbycostales/diva.git    # clone repository
cd diva                                            # navigate to directory
pip install --upgrade pip                          # upgrade pip if necessary
sudo apt install swig                              # necessary for Racing environment
. ./setup.sh                                       # set up conda env and install deps

Troubleshooting

The error:

AttributeError: module '_Box2D' has no attribute 'RAND_LIMIT_swigconstant'

can be resolved with:

pip uninstall box2d-py
pip install box2d-py

W&B

We use wandb for logging, which is free for academic use. Without wandb, you can still run the code and use tensorboard for logging, but some functionality may be limited.

Use the following commands to login and/or verify your credentials.

wandb login
wandb login --verify

From diva, run wandb_init.sh to set necessary environment variables. For entity, either enter your username or organization, and for project, enter diva, or any other name you would like to use:

. ./wandb_init.sh

Reproducing results

The following command structure can be used for reproducing the main results (within diva):

python main.py wandb_label=<wandb_label> domain=<domain> meta=<meta> dist=<dist>

wandb_label is for logging. domain specifies the environment (toygrid, alchemy, racing), meta specifies the meta-RL learner (varibad, rl2), and dist specifies the task distribution (diva, rplr, accel, dr, oracle, diva_plus). For diva_plus, the default configuration assumes you will load an archive from a prior DIVA run.

For F1 results, set domain.reg_env_id=CarRacing-F1-v0. For DIVA, you will need to save an archive from a normal run first, and load it in (since DIVA uses samples from domain.reg_env_id to parameterize archive).

Miscellaneous

To describe meta-RL rollouts within the same MDP, our work uses the language "episodes in a trial", while the Alchemy work uses "trials in an episode". Expect conflicting convenctions in certain parts of the code, especially surrounding the Alchemy environment.

License

This code is released under the MIT License. Some code is adpated from other repos (see below). Please see their respective licenses for more information.

  • components/level_replay and environments/box2d are adapted from the DCD repo.
  • components/policy, components/exploration/ , and components/vae are adapted from the HyperX repo (some are from the original VariBAD repo). Exploration bonuses are not used in our work, but are included in the repo because they may be useful for certain environments others may wish to implement.
  • components/qd adapts some elements from the pyribs repo, and the DSAGE repo.
  • environments/alchemy is adapted from the dm_alchemy repo.

About

Code for "Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity" (NeurIPS 2024)

Resources

License

Stars

Watchers

Forks

Languages