Skip to content

YixiangChen515/EC-Flow

Repository files navigation

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

ICCV 2025

[🏠Project Page] [📄Paper] [📊Dataset] [🤗Checkpoints]

TL;DR: A method for learning robotic manipulation policies solely from action-unlabeled videos, enabling versatile control over deformable objects, occluded environments, and non-object-displacement tasks.

Installation

# Clone the repository
git clone https://github.com/YixiangChen515/EC-Flow.git
cd EC-Flow

# Download pretrained checkpoints (SAM, GroundingDINO, Co-Tracker)
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt -O sam_and_track/checkpoints/sam2.1_hiera_large.pt
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth -O sam_and_track/gdino_checkpoints/groundingdino_swint_ogc.pth
wget https://huggingface.co/facebook/cotracker3/resolve/main/scaled_offline.pth -O sam_and_track/co-tracker/checkpoints/scaled_offline.pth

# Create conda environment
conda create -n ecflow python=3.8
conda activate ecflow

# Install dependencies
bash install.sh

Flow Prediction

1. Training

We provide the Meta-World dataset in our Huggingface repo. Please download the dataset and place it under the data directory.

There are two ways to prepare the training data:

  1. Use the metaworld.tar.gz file, which contains the pre-processed dataset with ground-truth point tracking results. This version is ready for training out of the box.

  2. Alternatively, you can start with the original Meta-World dataset by using metaworld_original.tar.gz. To generate the processed dataset from it, run:

python -m data_gen.gen_metaworld_all

Once the dataset is prepared, you can start training the flow prediction module by running:

# Note: The global batch size should be divisible by the number of devices. We trained on 8 NVIDIA RTX 4090 GPUs (24GB) with a batch size of 7 per GPU.
torchrun --nnodes=1 --nproc_per_node=8 train.py --results-dir ckpt --global-batch-size=56 --data-path=data/metaworld

2. Inference

You can download the pretrained checkpoints from our Huggingface repo and and place them in the ckpt directory. To evaluate both the flow prediction and goal image prediction results, run the following command:

python inference.py --ckpt ckpt/flow.pt --img-ckpt ckpt/goal_img.pt

Evaluation in Meta-World

To evaluate EC-Flow in the Meta-World environment, follow these steps:

  1. Download the pretrained checkpoints as described above.

  2. Apply the necessary environment modifications by following the instructions in modify_env.md (IMPORTANT).

Once the setup is complete, run the following command to start evaluation:

cd experiment
bash eval_policy.sh

Note: To speed up the evaluation process, you can use multiple GPUs by specifying the device IDs:

# Example Usage
bash eval_policy.sh "0,1,2,3"

License

This repository is released under the MIT license.

Acknowledgement

We extend our deepest thanks to the creators of these remarkable projects:

Contact

If you have any questions about the code, please contact yixiang.chen [AT] cripac.ia.ac.cn

Citation

Please consider citing EC-Flow if it benefits your research:

@InProceedings{Chen_2025_ICCV,
    author    = {Chen, Yixiang and Li, Peiyan and Huang, Yan and Yang, Jiabing and Chen, Kehan and Wang, Liang},
    title     = {EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {11958-11968}
}

About

[ICCV 2025] Official repo of "EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors