ImagineAct is a comprehensive framework for improving out-of-distribution (OOD) generalization of Vision-Language-Action (VLA) models through model-based reinforcement learning. This project combines diffusion-based world modeling, learned reward functions, and actor-critic RL to fine-tune VLAs for better generalization on robotic manipulation tasks.
This project addresses the challenge of training VLA models that generalize well beyond their training distribution. Our approach uses three key components:
- Diffusion-based World Model: A video prediction model trained on LIBERO dataset that can generate future robot states conditioned on actions
- Learned Proxy Rewards: Video-based reward models trained using Randomized Return Decomposition (RRD) that extract dense reward signals from visual observations
- RL Fine-tuning Pipeline: An end-to-end system that uses the world model as a simulator and learned rewards to fine-tune VLA policies
- Yash Jangir β offjangir@gmail.com
- Karan Mirakhor β karanmirakhor99@gmail.com
- Tanya Choudhary β tchoudha@andrew.cmu.edu
- Prakhar Mishra β pmishra3@andrew.cmu.edu
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ImagineAct Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. World Model Training β
β [LIBERO Dataset] β [Diffusion World Model] β
β β
β 2. Reward Model Learning β
β [LIBERO Trajectories] β [OpenVLA Features] β [RRD] β β
β [Learned Reward Function] β
β β
β 3. RL Fine-tuning β
β [OpenVLA Actor] β [World Model Env] β [Reward Model] β β
β [Critic] β [PPO Updates] β [Fine-tuned Policy] β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ImagineAct/
βββ world-model-eval-babel/ # Diffusion-based world model
β βββ world_model.py # World model implementation
β βββ model.py # Diffusion Transformer (DiT)
β βββ vae.py # Variational Autoencoder
β βββ diffusion.py # Diffusion sampling
β
βββ Randomized-Return-Decomposition/ # RRD for reward learning
β βββ algorithm/
β β βββ openvla_reward_net.py # OpenVLA-based reward network
β β βββ rrd_torch.py # Randomized Return Decomposition
β β βββ replay_buffer/ # Offline data buffers
β βββ offline_openvla_ac.py # Offline actor-critic
β βββ scripts/
β βββ train_offline_openvla_ac.py # Offline RL training
β βββ train_openvla_rl_worldmodel.py # Online RL with world model
β βββ train_with_openvla.py # Reward model training
β
βββ openvla/ # OpenVLA VLA model
βββ prismatic/ # Core model implementation
βββ experiments/ # Evaluation scripts
A video prediction model that generates future robot states conditioned on actions. The world model:
- Uses a Diffusion Transformer (DiT) architecture for temporal modeling
- Encodes/decodes frames using a VAE
- Trained on LIBERO manipulation dataset
- Supports autoregressive generation for long-horizon rollouts
Location: world-model-eval-babel/
Reward functions learned from offline data using Randomized Return Decomposition:
- Uses OpenVLA's vision and language features as input
- Trains dense reward models that correlate with task success
- Leverages RRD's return decomposition for stable learning
- Supports both offline and online reward computation
Key Files:
algorithm/openvla_reward_net.py: Reward network architecturescripts/train_with_openvla.py: Training script for reward modelsscripts/evaluate_rewards.py: Evaluation utilities
End-to-end RL system for fine-tuning VLA policies:
- Offline Actor-Critic: Train on pre-collected LIBERO trajectories
- Online RL with World Model: Use world model as environment for policy optimization
- Actor: OpenVLA model generating continuous actions
- Critic: Value function network using OpenVLA features
- Reward: Learned reward model for dense supervision
- Algorithm: PPO with GAE for advantage estimation
Key Scripts:
scripts/train_offline_openvla_ac.py: Offline RL trainingscripts/train_openvla_rl_worldmodel.py: Online RL with world model
- Python 3.8+
- PyTorch 2.0+
- Transformers (HuggingFace)
- OpenVLA model checkpoint
- LIBERO dataset (RLDS format)
# Clone the repository
git clone <repository-url>
cd ImagineAct
# Install dependencies for each component
cd Randomized-Return-Decomposition
pip install -r requirements/requirements_pytorch.txt
# Set up environment variables
export OPENVLA_PATH=$(pwd)/../openvla
export LIBERO_DATASET_PATH=/path/to/libero/datasetThe world model should be pre-trained on LIBERO data. To use an existing checkpoint:
from world_model import WorldModel
world_model = WorldModel(
checkpoint_path="/path/to/world_model.pt",
use_pixel_rope=False,
default_cfg=1.0
)Train a reward model using RRD on LIBERO trajectories:
cd Randomized-Return-Decomposition
python scripts/train_with_openvla.py \
--env libero-10 \
--use_openvla_features True \
--openvla_checkpoint /path/to/openvla/model \
--libero_dataset_path /path/to/libero/dataset \
--rrd_reward_only True \
--epochs 100 \
--tag "openvla_reward_libero_10"Train OpenVLA using offline RL on LIBERO data:
python scripts/train_offline_openvla_ac.py \
--features_cache_path log/feature_cache/openvla_features.pkl \
--rlds_dataset_path /path/to/libero/dataset \
--openvla_checkpoint /path/to/openvla/model \
--reward_model_checkpoint log/checkpoints/openvla_reward_libero_10/checkpoint_best.pt \
--critic_hidden_dims 2048 512 128 \
--epochs 121 \
--batch_size 16 \
--vla_lr 5e-6 \
--critic_lr 3e-4Fine-tune the policy using the world model as environment:
python scripts/train_openvla_rl_worldmodel.py \
--openvla_checkpoint /path/to/openvla/model \
--world_model_checkpoint /path/to/world_model.pt \
--reward_model_checkpoint log/checkpoints/openvla_reward_libero_10/checkpoint_best.pt \
--initial_states_path /path/to/initial/states \
--num_envs 8 \
--rollout_steps 2048 \
--num_updates 1000 \
--ppo_epochs 4 \
--vla_lr 1e-5 \
--critic_lr 3e-4 \
--use_wandb- β Autoregressive video generation
- β Action-conditioned state prediction
- β Support for long-horizon rollouts
- β Efficient sampling with DDIM
- β OpenVLA feature-based rewards
- β Dense reward signals for better learning
- β Compatible with offline datasets
- β Evaluation and visualization tools
- β Offline actor-critic on real data
- β Online RL with world model simulator
- β PPO with GAE advantage estimation
- β Behavior cloning for stability
- β Distributed training support
Evaluate the trained policy:
# Evaluate offline trained model
python scripts/evaluate_offline_openvla_ac.py \
--checkpoint_path log/checkpoints/offline_openvla_ac_final.pt \
--eval_dataset_path /path/to/eval/dataset
# Evaluate reward predictions
python scripts/evaluate_rewards.py \
--checkpoint_path log/checkpoints/openvla_reward_libero_10/checkpoint_best.ptThe framework enables:
- Improved OOD Generalization: Fine-tuned policies generalize better to unseen task configurations
- Efficient Training: World model allows fast policy iteration without real robot interaction
- Dense Rewards: Learned rewards provide better learning signal than sparse task rewards
- Scalable: Can leverage large offline datasets for initial training
This project builds upon and integrates several outstanding open-source projects:
We extend the world-model-eval codebase for diffusion-based world modeling. The world model implementation uses their diffusion transformer architecture and VAE for frame encoding/decoding.
We use OpenVLA as our base VLA model. OpenVLA provides excellent pretrained vision-language-action policies that serve as the foundation for our fine-tuning approach.
Our reward learning builds on the Randomized Return Decomposition framework. We adapt RRD to work with OpenVLA features for learning dense reward functions from offline robot data.
We are grateful to all the contributors and researchers behind these projects for making this work possible.
If you use this code in your research, please cite:
@software{imagineact2025,
title={ImagineAct: Model-Based RL for Vision-Language-Action Models},
author={Your Name},
year={2025},
url={https://github.com/yourusername/ImagineAct}
}This project integrates code from multiple sources. Please refer to:
openvla/LICENSEfor OpenVLA licensingRandomized-Return-Decomposition/LICENSEfor RRD licensing- Individual source files for world-model-eval licensing
- OpenVLA: Paper | Code
- Randomized Return Decomposition: Paper
- World-Model-Eval: Code
- LIBERO: Paper | Dataset
Contributions are welcome! Please feel free to submit a Pull Request.
For questions or issues, please open an issue on GitHub.
Note: This project is a research framework. Results may vary based on hardware, dataset versions, and hyperparameters. Please refer to individual component documentation for detailed setup instructions.