🐳[NeurIPS 2025] Accelerating Parallel Diffusion Model Serving with Residual Compression

TL;DR: Diffusion models exhibit heavy temporal redundancy, yet we transmit full activations step after step.
Why are we sending near-duplicated data across GPUs?
CompactFusion transmits only the compressed residuals — the real information change — to drastically reduce bandwidth with minimal quality loss.

☘️ Acknowledgements

We owe special thanks to the xDiT team—without their excellent open-source framework, this project would simply not exist.

Their work laid the foundation for everything we've built.

We thank the DistriFusion authors for sharing their code and system.

We also thank common_metrics_on_video_quality for their excellent video quality evaluation tools.

🍀 Motivation

Diffusion models generate data step-by-step, but their intermediate activations change slowly and predictably.
In multi-GPU inference, these large activations are repeatedly transmitted between devices.
The transmitted data are highly redundant, wasting precious bandwidth on near-duplicate content.
We ask: Why are we transmitting redundant stuff at all?

🚀 Introducing CompactFusion

CompactFusion is a residual compression framework for parallel diffusion model serving.
It compresses only the change (residual) between activations across steps — and adds optional error feedback to maintain reconstruction quality.

✅ CompactFusion only targets communication, making it:

📦 Plug-and-play: No model re-training or modification
🛠 Framework-compatible: Integrates into ring attention, patch parallelism and more
⚡ Extremely efficient: Up to 100× compression, with <1% data transmitted, but still outperforming DistriFusion in quality.

🐚 Method Illustration

Residual Compression Principle

System Architecture

✨ Supported Features

CompactFusion supports out-of-the-box compression for:

✅ Compressed Ring Attention
✅ Compressed Patch Parallel
✅ DistriFusion migrated to xDiT
✅ Patch Parallel migrated to xDiT
✅ FLUX, CogVideoX, SD, Pixart-alpha and other backbones

💾 Installation & Setup

We build CompactFusion on top of the excellent xDiT framework.

🐳 Recommended Setup

You may simply use the pre-built Docker image from xDiT:

docker pull thufeifeibear/xdit-dev

🔧 Code Examples

Example usages are provided in:

examples/cogvideox_example.py

examples/flux_example.py

We do not modify the setup of xDiT. You can refer directly to xDiT documentation for usage details.

🎓 Citation

If you use this code for your research, please cite our paper.

@article{luo2025accelerating,
  title={Accelerating Parallel Diffusion Model Serving with Residual Compression},
  author={Luo, Jiajun and Xiao, Yicheng and Xu, Jianru and You, Yangxiu and Lu, Rongwei and Tang, Chen and Jiang, Jingyan and Wang, Zhi},
  journal={arXiv preprint arXiv:2507.17511},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 715 Commits
.github/workflows		.github/workflows
benchmark		benchmark
common_metrics_on_video_quality		common_metrics_on_video_quality
compact_plot		compact_plot
docker		docker
docs		docs
entrypoints		entrypoints
examples		examples
test_script		test_script
tests		tests
xfuser		xfuser
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.txt		LICENSE.txt
README.md		README.md
README_xdit.md		README_xdit.md
download.py		download.py
plot_metrics.py		plot_metrics.py
prompts.json		prompts.json
pytest.ini		pytest.ini
setup.py		setup.py
vprompts.json		vprompts.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🐳[NeurIPS 2025] Accelerating Parallel Diffusion Model Serving with Residual Compression

☘️ Acknowledgements

🍀 Motivation

🚀 Introducing CompactFusion

🐚 Method Illustration

✨ Supported Features

💾 Installation & Setup

🐳 Recommended Setup

🔧 Code Examples

🎓 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Cobalt-27/CompactFusion

Folders and files

Latest commit

History

Repository files navigation

🐳[NeurIPS 2025] Accelerating Parallel Diffusion Model Serving with Residual Compression

☘️ Acknowledgements

🍀 Motivation

🚀 Introducing CompactFusion

🐚 Method Illustration

✨ Supported Features

💾 Installation & Setup

🐳 Recommended Setup

🔧 Code Examples

🎓 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages