Our training code is currently undergoing internal check of the company. Once it passes, we will open source it.
Our work can also be reproduced based on our paper.
D-OPSD is an on-policy self-distillation training framework for diffusion models especially timestep-distilled ones. It features in:
- D-OPSD identify an emergent property of modern text to image diffusion models with LLM/VLM encoders and utilize this property to the continuous tuning of step-distilled diffusion model.
- D-OPSD is a novel diffusion models on-policy self-distillation framework. By assigning the same model two roles with different contexts, D-OPSD enables supervised tuning on the student’s own roll-outs without requiring any external reward function or extra modules.
- D-OPSD is validated in different settings. The results show that our method enables the model to learn new concepts, styles, and domain preferences while preserving its original few-step inference capability and previous knowledge.
In full fine-tuning, D-OPSD adapts the model toward the target domain (anime) while retaining original-domain knowledge and few-step inference capability.
In small customized LoRA training, D-OPSD learns new concepts from only a few image-text pairs while maintaining few-step generation quality and generalizing to unseen prompts.
If you find D-OPSD useful, please kindly cite our paper:
@article{jiang2026dopsd,
title={D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models},
author={Jiang, Dengyang and Jin, Xin and Liu, Dongyang and Wang, Zanyi and Zheng, Mingzhe and Du, Ruoyi and Yang, Xiangpeng and Wu, Qilong and Li, Zhen and Gao, Peng and Yang, Harry and Hoi, Steven},
journal={arXiv preprint arXiv:2605.05204},
year={2026}
}

