Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion

Linfeng Tang^1*, Chunyu Li^1*, Jiayi Ma^†

Wuhan University
^*Equal Contribution ^†Corresponding Author

✨ News:

[2026-02-21] Our paper VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion has been officially accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)! [Paper] [Code]
[2025-09-18] Our paper ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts has been officially accepted by Advances in Neural Information Processing Systems (NeurIPS 2025)! [Paper] [Code]
[2025-09-10] Our paper Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion has been officially accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI)! [Paper] [Code]
[2025-03-15] Our paper C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning has been officially accepted by the International Journal of Computer Vision (IJCV)! [Paper] [Code]
[2025-02-11] We released a large-scale dataset for infrared and visible video fusion: M3SVD: Multi-Modal Multi-Scene Video Dataset.

🔎 Method Overview

Various Scheme Comparison

Framework

Vanilla masking scheme vs. our dual masking scheme.

⚙️ Installation

# git clone this repository
git clone https://github.com/Linfeng-Tang/Mask-DiFuser.git
cd Mask-DiFuser

# create an environment with python >= 3.8
conda create -n mask-difuser python=3.8
conda activate mask-difuser
pip install -r requirements.txt

🚀 Inference

Step 1: Download the pretrained model Mask-DiFuser from Baidu Drive or Google Drive, and put the weight into `checkpoint/`.

Step 2: Running inference command

python test.py --pretrained_path ./checkpoint/model.pt --task_type VIF --dirA ./dataset/MSRS/ir --dirB ./dataset/MSRS/vi --output_path ./Fusion/MSRS --gpu_ids 0

🔥 Train

Step1: Pretrained models and training data

Please download DIV2K dataset from the official DIV2K Website, structured as follows:

/dataset/DIV2K/
        ├── train/       
        │   ├── 0001.png
        │   ├── 0002.png
        │   └── ...
        ├── val/    
        │   ├── 0001.png
        │   ├── 0002.png
        │   └── ...

Step2: Run code

export OMP_NUM_THREADS=1
torchrun --nproc-per-node=4 train.py --dataset_path ./dataset/DIV2K --output_path ./result --gpu_ids 0,1,2,3

📷 Results

Visual comparison of infrared-visible image fusion results for night scenes on the MSRS dataset

Visual comparison of infrared-visible image fusion results on the RoadScene dataset

Visual comparison of multi-exposure image fusion results on the SICE dataset

Visual comparison of multi-exposure image fusion results on the MEFB dataset

Visual comparison of medical image fusion results on the Harvard dataset

Visual comparison of near-infrared and visible image fusion results on the Nirscene dataset

Visual comparison of multi-polarization fusion results on the Polarization dataset

Visual comparison of multi-focus image fusion results on the Lytro dataset

🕵️‍♂️ Detection

🎥 Segment

🎓 Citations

If our work is useful for your research, please consider citing and give us a star ⭐:

@article{Tang2026Mask-DiFuser,
  author={Tang, Linfeng and Li, Chunyu and Ma, Jiayi},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion}, 
  year={2026},
  volume={48},
  number={1},
  pages={591--608},
}

🤝 Contact

Please feel free to contact: linfeng0419@gmail.com, licy0089@gmail.com. We are very pleased to communicate with you and will maintain this repository during our free time.

❤️ Acknowledgments

Some codes are brought from CLEDiffusion, Stable-Diffusion. Thanks for their excellent works.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Diffusion		Diffusion
assets		assets
checkpoint		checkpoint
dataset		dataset
loss		loss
README.md		README.md
Scheduler.py		Scheduler.py
get_data.py		get_data.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion

✨ News:

🔎 Method Overview

Various Scheme Comparison

Framework

Vanilla masking scheme vs. our dual masking scheme.

⚙️ Installation

🚀 Inference

Step 1: Download the pretrained model Mask-DiFuser from Baidu Drive or Google Drive, and put the weight into `checkpoint/`.

Step 2: Running inference command

🔥 Train

Step1: Pretrained models and training data

Step2: Run code

📷 Results

Visual comparison of infrared-visible image fusion results for night scenes on the MSRS dataset

Visual comparison of infrared-visible image fusion results on the RoadScene dataset

Visual comparison of multi-exposure image fusion results on the SICE dataset

Visual comparison of multi-exposure image fusion results on the MEFB dataset

Visual comparison of medical image fusion results on the Harvard dataset

Visual comparison of near-infrared and visible image fusion results on the Nirscene dataset

Visual comparison of multi-polarization fusion results on the Polarization dataset

Visual comparison of multi-focus image fusion results on the Lytro dataset

🕵️‍♂️ Detection

🎥 Segment

🎓 Citations

🤝 Contact

❤️ Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion

✨ News:

🔎 Method Overview

Various Scheme Comparison

Framework

Vanilla masking scheme vs. our dual masking scheme.

⚙️ Installation

🚀 Inference

Step 1: Download the pretrained model Mask-DiFuser from Baidu Drive or Google Drive, and put the weight into checkpoint/.

Step 2: Running inference command

🔥 Train

Step1: Pretrained models and training data

Step2: Run code

📷 Results

Visual comparison of infrared-visible image fusion results for night scenes on the MSRS dataset

Visual comparison of infrared-visible image fusion results on the RoadScene dataset

Visual comparison of multi-exposure image fusion results on the SICE dataset

Visual comparison of multi-exposure image fusion results on the MEFB dataset

Visual comparison of medical image fusion results on the Harvard dataset

Visual comparison of near-infrared and visible image fusion results on the Nirscene dataset

Visual comparison of multi-polarization fusion results on the Polarization dataset

Visual comparison of multi-focus image fusion results on the Lytro dataset

🕵️‍♂️ Detection

🎥 Segment

🎓 Citations

🤝 Contact

❤️ Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 1: Download the pretrained model Mask-DiFuser from Baidu Drive or Google Drive, and put the weight into `checkpoint/`.

Packages