IGPO: Inpainting-Guided Policy Optimization for Diffusion Large Language Models

¹ Meta Superintelligence Labs ² UCLA ³ Tsinghua University, College of AI ⁴ MIT

^*Work done at Meta ^†Core Contribution

Overview

A novel policy optimization framework for diffusion large language models that leverages their unique "inpainting" ability to guide exploration and improve RL training efficiency and model performance.

Environment Setup

conda env create -f env.yml
conda activate igpo

Usage

Download the MetaMathQA dataset from Hugging Face.

After downloading, the structure should be:

igpo/MetaMathQA/
├── MetaMathQA-395K.json
└── README.md

To run IGPO:

sbatch run_igpo.slurm

(need to change the wandb api key in the slurm files)

To run GRPO:

sbatch run_grpo.slurm

Acknowledgement

This code is built on the D1 codebase.

Citation

If you find IGPO useful in your research, please consider citing:

@article{zhao2025inpainting,
  title={Inpainting-Guided Policy Optimization for Diffusion Large Language Models},
  author={Zhao, Siyan and Liu, Mengchen and Huang, Jing and Liu, Miao and Wang, Chenyu and Liu, Bo and Tian, Yuandong and Pang, Guan and Bell, Sean and Grover, Aditya and others},
  journal={arXiv preprint arXiv:2509.10396},
  year={2025}
}

License

IGPO is MIT licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
static		static
utils		utils
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
accelerate.yaml		accelerate.yaml
env.yml		env.yml
readme.md		readme.md
run_grpo.sh		run_grpo.sh
run_grpo.slurm		run_grpo.slurm
run_igpo.sh		run_igpo.sh
run_igpo.slurm		run_igpo.slurm
train.yaml		train.yaml
train_grpo.py		train_grpo.py
train_igpo.py		train_igpo.py
trainer_baseline.py		trainer_baseline.py
trainer_igpo.py		trainer_igpo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IGPO: Inpainting-Guided Policy Optimization for Diffusion Large Language Models

Overview

Environment Setup

Usage

Acknowledgement

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

facebookresearch/igpo

Folders and files

Latest commit

History

Repository files navigation

IGPO: Inpainting-Guided Policy Optimization for Diffusion Large Language Models

Overview

Environment Setup

Usage

Acknowledgement

Citation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages