GitHub - elle-miller/multimodal_rl: An RL library for training multimodal robotic agents in Isaac Lab.

Real-world robotics must move beyond simple state vectors. multimodal_rl provides a streamlined and robust foundation for training robotic agents in Isaac Lab that perceive the world through multiple lenses.

This library is designed as a core research dependency. It handles the RL "heavy lifting" and multimodal fusion, allowing you to focus on your environment and task science. It works in tandem with roto, which provides ready-to-use example environments and optimised agents.

✨ Features

Multimodal perception: Native support for flexible dictionary observations (RGB, Depth, Proprioception, Tactile, and Ground-truth states).
Self-supervised learning: Built-in integration for SSL auxiliary tasks (reconstruction, world models 🌏) to accelerate representation learning from multimodal observations.
Observation stacking: Uses LazyFrame stacking to handle partially observable environments, essential for real-world robotics.
Transparent codebase: Most RL libraries sacrifice clarity for modularity. We condense the entire PPO logic into four readable files, making it easy to inspect "under-the-hood".
Robust research: Integrated hyperparameter optimisation with Optuna to ensure fair comparisons and well-tuned agents.
Evaluation rigor: Dedicated split for training and evaluation parallelised environments to ensure efficient and accurate performance reporting. Evaluation uses frozen policy snapshots taken at episode boundaries, ensuring consistent metrics throughout each evaluation episode even as the networks update throughout training.

Installation

Install Isaac Lab via pip with these instructions
Install multimodal_rl as a local editable package.

git clone git@github.com:elle-miller/multimodal_rl.git
cd multimodal_rl
pip install -e .

You should now see it with pip show multimodal_rl.

Setup your own project! Check out roto to use existing environments or as a template for your own.

🏗 How it Works

multimodal_rl contains the RL engine, while your project repo contains the environments/research/science. This separation allows you to pull updates from the core library without messy merge conflicts in your environment code.

multimodal_rl provides 5 core functionalities:

rl: Clean PPO implementation
ssl: Modules for self-supervision learning
models: Standardised backbones (MLPs, CNNs) and running scalers.
tools: Scripts to produce nice RL paper plots, and extra stuff like latent trajectory visualisation.
wrappers: Wrappers for observation stacking and Isaac Lab

Evaluation Procedure

Evaluation runs continuously in parallel with training using dedicated evaluation environments. At each episode boundary (every max_episode_length steps), the current policy and encoder are snapshotted into frozen copies. These frozen models are used exclusively for evaluation, ensuring that each evaluation episode uses a consistent policy version even as training continues and updates the live policy. Evaluation environments are visually distinguished in the simulation (typically marked with pink boxes) and reset synchronously at episode boundaries. Episode metrics (returns, info logs) are accumulated with proper masking for terminated/truncated episodes, and logged at episode boundaries.

Staggered Resets

Training environments use staggered resets by default, where each environment starts with a random initial episode length offset uniformly distributed across [0, max_episode_length). This prevents all training environments from resetting simultaneously, improving sample diversity and training stability by ensuring environments are at different stages of their episodes throughout training.

📜 Credits

The PPO implementation is a streamlined and modified version of SKRL. This version has been refactored to prioritise multimodal fusion, evaluation rigor, and transparency.

📚 Citation

If this framework helps your research, please cite:

@misc{miller2026_multimodal_rl,
  author       = {Elle Miller},
  title        = {multimodal_rl: Multimodal RL for Real-world Robotics},
  year         = {2026},
  howpublished = {\url{https://github.com/elle-miller/multimodal_rl}},
  note         = {GitHub repository}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
multimodal_rl		multimodal_rl
readme_assets		readme_assets
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Features

Installation

🏗 How it Works

Evaluation Procedure

Staggered Resets

📜 Credits

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ Features

Installation

🏗 How it Works

Evaluation Procedure

Staggered Resets

📜 Credits

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages