[ICLR 2026] Translating Flow to Policy via Hindsight Online Imitation

Yitian Zheng*, Zhangchen Ye*, Weijun Dong*, Shengjie Wang, Yuyang Liu, Chongjie Zhang, Chuan Wen^✉, Yang Gao^✉

Installation

git clone --recursive git@github.com:yzc0731/HinFlow.git
cd HinFlow

conda env create -f environment.yml
conda activate hinflow

pip install -e third_party/robosuite/
pip install -e third_party/robomimic/
pip install -e third_party/maniskill/

Dataset

We provide the preprocessed dataset to reproduce the results in our paper. You can download it from Hugging Face Hub.

Or you can collect and preprocess the dataset yourself by following instructions below.

Collect Dataset

For LIBERO tasks, you can download raw LIBERO dataset by running download_libero_datasets, do SpaceMouse teleoperation, or develop your own scripted policy. For more details, please refer to CREATE YOUR OWN DATASETS in LIBERO Docs.

For ManiSkill tasks, please refer to ManiSkill Data Collection. Our method require control mode to be pd_ee_delta_pose and observation to be rgb+segmentation.

Because the ManiSkill data format is different from LIBERO, we provide a script to convert here.

Dataset Preprocessing

Dataset need to be preprocessed with Cotracker:

python -m scripts.preprocess \
  --source_hdf5=path/to/raw/data.hdf5 \
  --target_dir=path/to/preprocessed/data.hdf5 \
  --sampler=SegmentSampler \
  --use_points=1 \
  --sampler_cfg=path/to/preprocess/task.yaml \
  --env_type=maniskill

Training

To replicate the results in our paper, use the following task names: libero_butter, libero_book, libero_chocolate, libero_microwave, maniskill_pokecube, maniskill_pullcubetool, and maniskill_placesphere.

The training of our method includes two stages:

Stage 1: High Level Planner

We have provided the checkpoints of High Level Planner to reproduce the results in our paper. You can download it from Hugging Face Hub. Or you can do it yourself by following instructions below.

First, split the datasets into training and validation sets.

python -m scripts.split_trainval --folder=data/planner_dataset/${task}

The High Level Planner training can be executed by this command:

python -m scripts.train_planner --task=${task}

Stage 2: Low Level Policy with Hindsight Online Imitation

Our policy can be trained with:

python -m scripts.train_hinflow_policy --task=${task} --gpu=${gpu_id} --planner=${planner_path}

Here planner_path is the path to the folder of the trained high level planner, it should contain model_best.ckpt and config.yaml.

Baseline

To replicate the results in our paper, we provide 3 mode choices: bc, atm_grid, and atm_seg. The planner used in atm_grid and atm_seg baseline is the same as our method. In the training and evaluation of bc, --planner is required as a placeholder but will not be used.

Before training the baseline, process the dataset in data/policy_dataset/${task} using this script:

python -m scripts.label_points --task=${task} --mode=${mode}

Training scripts:

python -m scripts.train_baseline --task=${task} --planner=${planner_path} --mode=${mode}

Evaluation scripts:

python -m scripts.eval_baseline --task=${task} --exp-dir=path/to/your/exp/dir --planner=${planner_path} --mode=${mode}

Acknowledgement

Thanks to these excellent open source projects:

Citation

If you find our codebase is useful for your research, please cite our paper with this bibtex:

@inproceedings{zheng2026translating,
  title={Translating Flow to Policy via Hindsight Online Imitation},
  author={Zheng, Yitian and Ye, Zhangchen and Dong, Weijun and Wang, Shengjie and Liu, Yuyang and Zhang, Chongjie and Wen, Chuan and Gao, Yang},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
atm		atm
conf		conf
doc		doc
engine		engine
libero		libero
scripts		scripts
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2026] Translating Flow to Policy via Hindsight Online Imitation

Installation

Dataset

Collect Dataset

Dataset Preprocessing

Training

Stage 1: High Level Planner

Stage 2: Low Level Policy with Hindsight Online Imitation

Baseline

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2026] Translating Flow to Policy via Hindsight Online Imitation

Installation

Dataset

Collect Dataset

Dataset Preprocessing

Training

Stage 1: High Level Planner

Stage 2: Low Level Policy with Hindsight Online Imitation

Baseline

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages