EZ-M is a multi-task extension of EfficientZero series focused on humanoid locomotion tasks. We also reserved support for Atari (single-task) and DMControl single-task training. We recommend multi-task training in the future RL due to superior overall sample efficiency. More details could refer to:
The training pipeline is built around Hydra configuration, Ray-based workers, PyTorch models, and a custom C++/Cython Gumbel search backend. The main entry points are:
ez/train.pyfor trainingez/eval.pyfor evaluation
- MuZero-style training and evaluation workflow
- Support for both discrete and continuous action spaces
- Environment presets under
ez/config/exp/ - Distributed execution
- Optional experiment tracking with Weights & Biases
- Support multi-task training with higher overall sample efficiency
- Python 3.8 or later
- CUDA-enabled GPU environment recommended for training
- Dependencies from
requirements.txtorrequirements_py310.txt
Install dependencies with:
pip install -r requirements.txtIf you are using Python 3.10, you can use:
pip install -r requirements_py310.txtBefore training or evaluation, compile the C++/Cython MCTS module:
cd ez/mcts/ctree_v2
bash make.sh
cd -Reported scores could be found in results/ez-m-results.json
Run training with one of the experiment configs in ez/config/exp/:
python ez/train.py exp_config=ez/config/exp/dmc_state.yamlYou can also use the provided shell script:
bash scripts/train.shExample experiment configs:
ez/config/exp/atari.yamlez/config/exp/dmc_image.yamlez/config/exp/dmc_state.yamlez/config/exp/maniskill_state.yamlez/config/exp/humanoid_bench_state.yaml
Run evaluation with:
python ez/eval.py exp_config=ez/config/exp/dmc_image.yamlOr use the script:
bash scripts/eval.shez/
agents/ Agent definitions and model implementations
config/ Global and experiment-specific Hydra configs
data/ Replay buffer, trajectory, and data processing utilities
envs/ Environment wrappers and integrations
mcts/ Python and C++/Cython MCTS implementations
utils/ Training utilities and helper functions
worker/ Distributed worker logic for training and evaluation
scripts/ Example launch scripts
- Some training scripts include environment-specific settings such as
MUJOCO_GL,CUDA_VISIBLE_DEVICES, andwandb login. Adjust them before running on your machine. - The repository contains multiple experiment presets; choose the one that matches your target environment and observation type.
@article{liu2026scaling,
title={Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning},
author={Liu, Shaohuai and Ye, Weirui and Du, Yilun and Xie, Le},
journal={arXiv preprint arXiv:2603.01452},
year={2026}
}