Yitian Zheng*, Zhangchen Ye*, Weijun Dong*, Shengjie Wang, Yuyang Liu, Chongjie Zhang, Chuan Wen✉, Yang Gao✉
git clone --recursive git@github.com:yzc0731/HinFlow.git
cd HinFlow
conda env create -f environment.yml
conda activate hinflow
pip install -e third_party/robosuite/
pip install -e third_party/robomimic/
pip install -e third_party/maniskill/We provide the preprocessed dataset to reproduce the results in our paper. You can download it from Hugging Face Hub.
Or you can collect and preprocess the dataset yourself by following instructions below.
For LIBERO tasks, you can download raw LIBERO dataset by running download_libero_datasets, do SpaceMouse teleoperation, or develop your own scripted policy. For more details, please refer to CREATE YOUR OWN DATASETS in LIBERO Docs.
For ManiSkill tasks, please refer to ManiSkill Data Collection. Our method require control mode to be pd_ee_delta_pose and observation to be rgb+segmentation.
Because the ManiSkill data format is different from LIBERO, we provide a script to convert here.
Dataset need to be preprocessed with Cotracker:
python -m scripts.preprocess \
--source_hdf5=path/to/raw/data.hdf5 \
--target_dir=path/to/preprocessed/data.hdf5 \
--sampler=SegmentSampler \
--use_points=1 \
--sampler_cfg=path/to/preprocess/task.yaml \
--env_type=maniskillTo replicate the results in our paper, use the following task names: libero_butter, libero_book, libero_chocolate, libero_microwave, maniskill_pokecube, maniskill_pullcubetool, and maniskill_placesphere.
The training of our method includes two stages:
We have provided the checkpoints of High Level Planner to reproduce the results in our paper. You can download it from Hugging Face Hub. Or you can do it yourself by following instructions below.
First, split the datasets into training and validation sets.
python -m scripts.split_trainval --folder=data/planner_dataset/${task}The High Level Planner training can be executed by this command:
python -m scripts.train_planner --task=${task}Our policy can be trained with:
python -m scripts.train_hinflow_policy --task=${task} --gpu=${gpu_id} --planner=${planner_path}Here planner_path is the path to the folder of the trained high level planner, it should contain model_best.ckpt and config.yaml.
To replicate the results in our paper, we provide 3 mode choices: bc, atm_grid, and atm_seg. The planner used in atm_grid and atm_seg baseline is the same as our method. In the training and evaluation of bc, --planner is required as a placeholder but will not be used.
Before training the baseline, process the dataset in data/policy_dataset/${task} using this script:
python -m scripts.label_points --task=${task} --mode=${mode}Training scripts:
python -m scripts.train_baseline --task=${task} --planner=${planner_path} --mode=${mode}Evaluation scripts:
python -m scripts.eval_baseline --task=${task} --exp-dir=path/to/your/exp/dir --planner=${planner_path} --mode=${mode}Thanks to these excellent open source projects:
If you find our codebase is useful for your research, please cite our paper with this bibtex:
@inproceedings{zheng2026translating,
title={Translating Flow to Policy via Hindsight Online Imitation},
author={Zheng, Yitian and Ye, Zhangchen and Dong, Weijun and Wang, Shengjie and Liu, Yuyang and Zhang, Chongjie and Wen, Chuan and Gao, Yang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
