ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts [NeurIPS 2025]
*Equal Contribution †Corresponding Author
-
[2026-02-21] Our paper VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion has been officially accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)! [Paper] [Code]
-
[2025-09-18] Our paper ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts has been officially accepted by Advances in Neural Information Processing Systems (NeurIPS 2025)! [Paper] [Code]
-
[2025-09-10] Our paper Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion has been officially accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI)! [Paper] [Code]
-
[2025-03-15] Our paper C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning has been officially accepted by the International Journal of Computer Vision (IJCV)! [Paper] [Code]
-
[2025-02-11] We released a large-scale dataset for infrared and visible video fusion: M3SVD: Multi-Modal Multi-Scene Video Dataset.
-
Clone this repository:
git clone https://github.com/Linfeng-Tang/ControlFusion.git cd ControlFusion -
Create a Conda environment (recommended):
conda create -n controlfusion python=3.8 -y conda activate controlfusion
-
Install dependency packages:
pip install -r requirements.txt
please refer to genDateset
Download the pretrained model Mask-DiFuser from Baidu Drive, and put the weight into pretrained_weights/.
You can use the test.py script we provide to fuse pairs of images. Please make sure you have downloaded the pre-trained weights.
You can modify ControlFusion.py to select text/auto control by:
text_features = self.get_text_feature(text.expand(b, -1)).to(inp_img_A.dtype)
text_features = imgfeatureYou can use the train.py script we provide to train. Make sure you have organized your train dataset correctly.
If our work is useful for your research, please consider citing and give us a star ⭐:
@inproceedings{Tang2025ControlFusion,
author={Linfeng Tang, Yeda Wang, Zhanchuan Cai, Junjun Jiang, and Jiayi Ma},
title={ControlFusion: A Controllable Image Fusion Network with Language-Vision Degradation Prompts},
booktitle={Advances in Neural Information Processing Systems},
year={2025},
}
Please feel free to contact: linfeng0419@gmail.com, wangyeda@whu.edu.cn.
We are very pleased to communicate with you and will maintain this repository during our free time.





