MolmoAct2: Action Reasoning Models for Real-world Deployment

MolmoAct2 is Ai2's open family of action reasoning models for robot control and real-world deployment. It builds on the Molmo2-ER embodied-reasoning vision-language backbone, adds robot state and action modeling, and connects the VLM to a flow-matching continuous action expert for closed-loop manipulation. The release includes base checkpoints for continued training, fine-tuned robot policies for evaluation and deployment, and the datasets used to build MolmoAct2 and Molmo2-ER.

Models

Base Models

We provide base checkpoints at every training stage for continued MolmoAct2 training and robot fine-tuning. These are foundation checkpoints rather than one-size-fits-all deployment policies.

Model	Use Case	Description	Checkpoint Path
MolmoAct2	Fine-tuning	Post-trained MolmoAct2 model with a continuous flow-matching action expert. Use as the default foundation checkpoint for adapting to a target robot embodiment or benchmark.	https://huggingface.co/allenai/MolmoAct2
MolmoAct2-Think	Fine-tuning	MolmoAct2 foundation checkpoint with depth-token reasoning. Use when downstream policies should reason over compact depth predictions before acting.	https://huggingface.co/allenai/MolmoAct2-Think
MolmoAct2-Pretrain	Post-training	Pre-trained discrete autoregressive VLA backbone before the continuous action expert is attached. Intended for continuing MolmoAct2 training stages, not direct continuous-control inference.	https://huggingface.co/allenai/MolmoAct2-Pretrain
Molmo2-ER	Pre-training	Embodied-reasoning VLM backbone used as the starting point for MolmoAct2 action models.	https://huggingface.co/allenai/Molmo2-ER

Finetuned Models

We also provide fine-tuned checkpoints for common robot platforms and benchmarks. These models are intended to run directly in their target setting, or to serve as a stronger starting point for closely related robots. As with any robot policy, performance depends on hardware, cameras, calibration, action conventions, and language/task distribution.

Model	Use Case	Description	Checkpoint Path
MolmoAct2-DROID	Inference / Fine-tuning	MolmoAct2 fine-tuned on the filtered DROID Franka mixture with absolute joint-pose control. Intended for DROID-style policy inference or further fine-tuning.	https://huggingface.co/allenai/MolmoAct2-DROID
MolmoAct2-BimanualYAM	Inference / Fine-tuning	MolmoAct2 fine-tuned on the bimanual YAM mixture with absolute joint-pose control and annotated language instructions.	https://huggingface.co/allenai/MolmoAct2-BimanualYAM
MolmoAct2-SO100_101	Inference / Fine-tuning	MolmoAct2 fine-tuned on SO-100/SO-101 datasets with absolute joint-pose control and annotated language instructions.	https://huggingface.co/allenai/MolmoAct2-SO100_101
MolmoAct2-LIBERO	Inference / Fine-tuning	MolmoAct2 fine-tuned on the full LIBERO training mixture, combining Spatial, Object, Goal, and Long suites.	https://huggingface.co/allenai/MolmoAct2-LIBERO
MolmoAct2-Think-LIBERO	Inference / Fine-tuning	MolmoAct2-Think fine-tuned on LIBERO with depth-and-action examples and adaptive depth reasoning.	https://huggingface.co/allenai/MolmoAct2-Think-LIBERO

Datasets

Data	Description	Dataset Path
MolmoAct2-BimanualYAM Dataset	Collection of bimanual YAM datasets and related resources used for MolmoAct2 bimanual training and evaluation.	https://huggingface.co/collections/allenai/molmoact2-bimanualyam-dataset-69f81e17b140ec34f430a35e
MolmoAct2 Robotics Datasets	Robotics datasets for MolmoAct2 training and fine-tuning, including SO-100/SO-101, DROID, MolmoAct Dataset, BC-Z, Bridge, and RT-1.	https://huggingface.co/collections/allenai/molmoact2-datasets-69f81e316ec3daafe3f9555c
Molmo2-ER Datasets	Embodied reasoning datasets used for Molmo2-ER and MolmoAct2 backbone training, including spatial, 3D, robotics, and visual reasoning data.	https://huggingface.co/collections/allenai/molmo2-er-datasets-69f8d605d92d46a5fc24ced2

Note that all of the robotics datasets for pre-training and post-training are in LeRobot v3.0 format, paired with extra language annotations.

Evaluation

The MolmoAct2 LeRobot fork is included as a Git submodule at lerobot/. After cloning this repository, initialize the submodule from the repo root:

git submodule update --init --recursive
cd lerobot

For LIBERO replication and other evaluation instructions, follow the local LeRobot README at lerobot/README.md after the submodule is initialized.

Real-world Deployment

MolmoAct2 supports out-of-the-box deployment on three robot embodiments:

SO-100
Bimanual YAMs
Franka DROID setup

SO-100 Setup

For the best performance, we recommend using an SO-100 with the standard wrist configuration and a third-person camera.

Bimanual YAM Setup

For the best performance, please build your Bimanual YAM setup following the reference design below:

All required components can be purchased using this Bimanual YAM parts list.

Implementation code for setting up, data collection, and inference for Bimanual YAM is here

Franka Setup

For the Franka setup, we recommend following the official DROID implementation for best results.

Coming Soon

Full code for training, fine-tuning, deployment, evaluation, and more details are coming soon.

License

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

Model and Hardware Safety

MolmoAct2 generate robot actions from visual observations and language instructions, but their behavior may vary across embodiments, environments, and hardware configurations. Users should carefully validate model outputs before deployment, especially when operating physical robots or other actuated systems. Where possible, actions should be monitored through interpretable intermediate outputs (adaptive depth map), simulation rollouts, action limits, or other safety checks before execution on hardware. The model’s action space should be bounded by the training data, robot controller limits, and task-specific safety constraints, including limits on speed, workspace, torque, and contact force. Users should follow the hardware manufacturer’s safety guidelines, use appropriate emergency-stop mechanisms, and operate the system only in a safely configured environment with human supervision.

Contacts

For questions, collaborations, or support, please contact with:

{hqfang,duanj1}@cs.washington.edu

Found a bug or have a feature request? Please open a GitHub issue.

Citation

@misc{fang2026molmoact2actionreasoningmodels,
      title={MolmoAct2: Action Reasoning Models for Real-world Deployment}, 
      author={Haoquan Fang and Jiafei Duan and Donovan Clay and Sam Wang and Shuo Liu and Weikai Huang and Xiang Fan and Wei-Chuan Tsai and Shirui Chen and Yi Ru Wang and Shanli Xing and Jaemin Cho and Jae Sung Park and Ainaz Eftekhar and Peter Sushko and Karen Farley and Angad Wadhwa and Cole Harrison and Winson Han and Ying-Chun Lee and Eli VanderBilt and Rose Hendrix and Suveen Ellawela and Lucas Ngoo and Joyce Chai and Zhongzheng Ren and Ali Farhadi and Dieter Fox and Ranjay Krishna},
      year={2026},
      eprint={2605.02881},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.02881}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
lerobot @ c123084		lerobot @ c123084
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MolmoAct2: Action Reasoning Models for Real-world Deployment

Models

Base Models

Finetuned Models

Datasets

Evaluation

Real-world Deployment

SO-100 Setup

Bimanual YAM Setup

Franka Setup

Coming Soon

License

Model and Hardware Safety

Contacts

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MolmoAct2: Action Reasoning Models for Real-world Deployment

Models

Base Models

Finetuned Models

Datasets

Evaluation

Real-world Deployment

SO-100 Setup

Bimanual YAM Setup

Franka Setup

Coming Soon

License

Model and Hardware Safety

Contacts

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages