Skip to content

yangzhao-666/MoReFree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reset-free Reinforcement Learning with World Models

Transactions on Machine Learning Research (TMLR), 2025
Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu

The official implementation of the Model-based Reset-free (MoReFree) agent, a model-based RL agent for reset-free tasks.

If you find our paper or code useful, please reference us:

@article{yangreset,
  title={Reset-free Reinforcement Learning with World Models},
  author={Yang, Zhao and Moerland, Thomas M and Preuss, Mike and Plaat, Aske and Hu, Edward S},
  journal={Transactions on Machine Learning Research}
}

To learn more:

MoReFree builds on PEG, an unsupervised RL agent for hard exploration tasks. We first adapted it to the reset-free setting, then added back-and-forth exploration and task-relevant imagination training.

Model-based Reset-free RL agent (MoReFree)

During exploration (data collection), the long reset-free horizon is splited into small 'chunks'. Within each chunk, MoReFree strikes a balance between exploring unseen states and practicing optimal behavior in task-relevant regions by directing the goal-conditioned policy to achieve evaluation states, initial state states (emulating a reset), and exploratory goals (defined by PEG). During imagination training, MoReFree focuses the goal-conditioned policy training inside the world model on achieving evaluation states, initial states, and random replay buffer states to better prepare the policy for the aforementioned exploration scheme.

MoReFree Teaser

Model-based reset-free RL agents (reset-free PEG and MoReFree) outperform baselines in all tasks, and MoReFree performs the best in three more challenging tasks.

MoReFree Curves

Quickstart

To better understand MoReFree, you can first take a look at PEG and follow the installation of PEG. We list our main modifications over PEG below:

  • In resetfree/env.py, we convert a resetfree environment (EARL format) to an environment that returns a 'done' flag every n steps to inform the PEG codebase to finish the current 'chunk' and move to the next 'chunk' by sampling another goal, but keeping the environment itself as is.
  • In resetfree/goal_picker_wrapper.py, we define the goal sampling logic, with the probability of $p$ sampling task-relevant goals ($\rho_g^*$ and $\rho_0$) and the reset of time sampling exploratory goals using PEG ($\rho_E$).
  • To implement task-relevant imagination training, we set train_env_goal_perccent = 0.2. We first include both initial states and goal states into env_goal (the original PEG only includes goal states), then sample and train the poicy on them.

MoReFree installation

If you already finish the PEG installation, you can skip this part.

Create the conda environment by running:

conda env create -f environment.yml

Then navigate to the MoReFree folder and set it up as a local python module

# in the MoReFree folder, like /home/MoReFree/
pip install -e .

Environment installation

We evaluate MoReFree on 8 environments: 2 (Tabletop and Sawyer Door) is from EARL benchmark, 3 (PointUMaze, Fetch Push and Fetch Pick&Place) is from IBC tasks, we made 3 more challengling tasks ourselves (Ant, Fetch Push Hard and Fetch Pick&Place Hard).

You can follow EARL codebase to set up their environments. For other environments from IBC or mrl of the PEG codebase, we already included them in the envs folder.

Running Experiments

You can now run the experiment by

python examples/run_goal_cond.py --configs <configs here> --logdir <your_logdir>

We set which method we want and environment we want through the configs.yaml. If you take a look at it, you can see there are a bunch of predefined configurations. Here are the relevant ones.

  • PointUMaze: point_umaze
  • Tabletop: earl_tabletop
  • Sawyer Door: earl_sawyer_door
  • Fetch Push: fetch_push
  • Fetch Push (hard): fetch_push_hard
  • Fetch Pick&Place: fetch_pick
  • Fetch Pick&Place (hard): fetch_pick_hard
  • Ant: ant_maze

So if you would like to train a MoReFree agent in PointUMaze environment, you need to run: python examples/run_goal_cond.py --configs point_umaze --logdir 'zhao/logdir/pointumaze/'.

We list some important hyperparameters that MoReFree includes:

  • alpha_s, alpha_e, alpha_decay_rate: scheduling of $\alpha$, which controls the back-and-forth exploration ratio. In the paper, $\alpha=0.2$ so we set alpha_s = alpha_e = $0.2$.
  • train_env_goal_percent: the ratio of task-relevant states that the goal-conditioned policy is trained on duing imagination training. In the paper, we set it to $0.2$.

Visualization and Checkpointing

Once your agent is training, navigate to the log folder specified by --logdir. It should look something like this:

point_umaze/
  |- config.yaml            # parsed configs
  |- eval_episodes/         # folder of evaluation episodes
  |- train_episodes/        # replay buffer/folder of training episodes
  |- events.out.tfevents    # tensorboard outputs (scalars, GIFS)
  |- metrics.jsonl          # metrics in json format for plotting
  |- variables.pkl          # most recent snapshot of weights

If you open up the tensorboard, you should see scalars for training, and evaluation.

Acknowledgements

MoReFree builds on prior work PEG, some environments are taken from IBC and EARL, so we thank the authors for their contributions.

  • PEG for the RL agent, environments, and nice README.
  • IBC for the baseline agent and their environments.
  • MEDAL for the baseline agent.
  • EARL for their environments.

About

The official implementation of "Reset-free Reinforcement Learning with World Models", TMLR 2025.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published