CoRL 2025 (Oral Presentation)
[arXiv] •
[Project Page]
Abstract: Efficient robot control often requires balancing task performance with energy expenditure. A common approach in reinforcement learning (RL) is to penalize energy use directly as part of the reward function. This requires carefully tuning weight terms to avoid undesirable trade-offs where energy minimization harms task success. In this work, we propose a hyperparameter-free gradient optimization method to minimize energy expenditure without conflicting with task performance. Inspired by recent works in multitask learning, our method applies policy gradient projection between task and energy objectives to derive policy updates that minimize energy expenditure in ways that do not impact task performance. We evaluate this technique on standard locomotion benchmarks of DM-Control and HumanoidBench and demonstrate a reduction of 64% energy usage while maintaining comparable task performance. Further, we conduct experiments on a Unitree GO2 quadruped showcasing Sim2Real transfer of energy efficient policies. Our method is easy to implement in standard RL pipelines with minimal code changes, is applicable to any policy gradient method, and offers a principled alternative to reward shaping for energy efficient control policies.
-
Clone the repository:
git clone --recurse-submodules https://github.com/pvskand/PEGrad.git cd PEGrad -
Create conda environment:
conda env create -f pegrad-env.yml conda activate pegrad-env
-
Install
uv(if not already installed):curl -LsSf https://astral.sh/uv/install.sh | shOr via pip:
pip install uv -
Install dependencies using
uv(much faster than pip):uv pip install -e . -
Set up Humanoid Bench:
cd src/pegrad/leanrl/envs/humanoid-bench uv pip install -e . cd ../../
Before training, you need to set up your Weights & Biases account and project. You can do this by first running:
wandb loginThen, change the entity and project in the config.yaml file to your own.
entity: your_wandb_entity
project: your_wandb_projectTo train with default configurations on Humanoid Bench h1-walk-v0, run:
python -m leanrl.sac.sac_pegrad To train on other Humanoid Bench environments, run:
python -m leanrl.sac.sac_pegrad env_id=humanoidbench/h1-run-v0 seed=1To train on quadruped-run environment in DM-Control, run:
python -m leanrl.sac.sac_pegrad env_id=dmcontrol/quadruped-runTo train on dog-run environment in DM-Control, run:
python -m leanrl.sac.sac_pegrad env_id=dmcontrol/dog-run- In case of the following GLFW error:
Try setting the environment variable
GLFW error 65537: b'The GLFW library is not initialized'MUJOCO_GLtoegl.export MUJOCO_GL=egl
PEGrad is licensed under the MIT License.
We thank the authors of the following projects for their work:
If you find this work useful, please consider citing:
@article{PEGRAD,
author = {Peri, Skand and Perincherry*, Akhil and Pandit*, Bikram and Lee, Stefan},
title = {Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control},
journal = {Conference on Robot Learning},
year = {2025},
}