EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
ICCV 2025
[🏠Project Page] [📄Paper] [📊Dataset] [🤗Checkpoints]
TL;DR: A method for learning robotic manipulation policies solely from action-unlabeled videos, enabling versatile control over deformable objects, occluded environments, and non-object-displacement tasks.
# Clone the repository
git clone https://github.com/YixiangChen515/EC-Flow.git
cd EC-Flow
# Download pretrained checkpoints (SAM, GroundingDINO, Co-Tracker)
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt -O sam_and_track/checkpoints/sam2.1_hiera_large.pt
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth -O sam_and_track/gdino_checkpoints/groundingdino_swint_ogc.pth
wget https://huggingface.co/facebook/cotracker3/resolve/main/scaled_offline.pth -O sam_and_track/co-tracker/checkpoints/scaled_offline.pth
# Create conda environment
conda create -n ecflow python=3.8
conda activate ecflow
# Install dependencies
bash install.shWe provide the Meta-World dataset in our Huggingface repo. Please download the dataset and place it under the data directory.
There are two ways to prepare the training data:
-
Use the
metaworld.tar.gzfile, which contains the pre-processed dataset with ground-truth point tracking results. This version is ready for training out of the box. -
Alternatively, you can start with the original Meta-World dataset by using
metaworld_original.tar.gz. To generate the processed dataset from it, run:
python -m data_gen.gen_metaworld_allOnce the dataset is prepared, you can start training the flow prediction module by running:
# Note: The global batch size should be divisible by the number of devices. We trained on 8 NVIDIA RTX 4090 GPUs (24GB) with a batch size of 7 per GPU.
torchrun --nnodes=1 --nproc_per_node=8 train.py --results-dir ckpt --global-batch-size=56 --data-path=data/metaworldYou can download the pretrained checkpoints from our Huggingface repo and and place them in the ckpt directory. To evaluate both the flow prediction and goal image prediction results, run the following command:
python inference.py --ckpt ckpt/flow.pt --img-ckpt ckpt/goal_img.ptTo evaluate EC-Flow in the Meta-World environment, follow these steps:
-
Download the pretrained checkpoints as described above.
-
Apply the necessary environment modifications by following the instructions in modify_env.md (IMPORTANT).
Once the setup is complete, run the following command to start evaluation:
cd experiment
bash eval_policy.shNote: To speed up the evaluation process, you can use multiple GPUs by specifying the device IDs:
# Example Usage
bash eval_policy.sh "0,1,2,3"This repository is released under the MIT license.
We extend our deepest thanks to the creators of these remarkable projects:
If you have any questions about the code, please contact yixiang.chen [AT] cripac.ia.ac.cn
Please consider citing EC-Flow if it benefits your research:
@InProceedings{Chen_2025_ICCV,
author = {Chen, Yixiang and Li, Peiyan and Huang, Yan and Yang, Jiabing and Chen, Kehan and Wang, Liang},
title = {EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {11958-11968}
}
