M3SVD: Multi-Modal Multi-Scene Video Dataset

📦 Notes on Provided Formats (Images vs. Videos)

Baidu Netdisk Images: https://pan.baidu.com/s/1g8jixAr39n06JWPwrBE6lQ?pwd=M2VD

Baidu Netdisk Videos: https://pan.baidu.com/s/1z_kMLxYejPvt_17SNGlOTA?pwd=M2VD

GoogleDrive Videos: https://drive.google.com/file/d/1bRoNhQBzWtj0y8CMGdXQvjbQCGAVddPX/view?usp=sharing

Images are released in the native per-frame format that is directly used by the VideoFusion project (i.e., frame sequences under each clip folder), so you can plug them into the training/testing pipeline without any extra conversion.
Videos are additionally provided by packing each frame sequence into a single video file (e.g., .mp4) to reduce file count and avoid storage / hosting limitations (many platforms struggle with extremely large numbers of small image files).

🔁 Convert Videos to Frame Sequences (Recommended)

If you download the Videos version and need per-frame image sequences (the format directly used in VideoFusion), please use:

video2img.py: https://github.com/Linfeng-Tang/M3SVD/blob/main/video2img.py

This script converts each .mp4 clip into an ordered frame sequence and restores the dataset layout for training/testing.

🧾 Folder Meaning

We provide both high-quality (clean/enhanced) data and degraded data for infrared and visible modalities:

infrared_Enhance: High-quality infrared (IR) frames (clean/enhanced version).
visible_Enhance: High-quality visible (VI) frames (clean/enhanced version).
infrared_noise: Degraded infrared (IR) frames with stripe noise (a typical IR sensor degradation).
visible_Blur: Degraded visible (VI) frames with blur (e.g., motion/defocus blur).

✨ News

[2026] Our paper “VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion” has been accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026! [Paper] [Code]
[2025] M3SVD dataset is officially released.

📖 Introduction

M3SVD (Multi-Modal Multi-Scene Video Dataset) is a large-scale infrared-visible (IR-VI) video dataset designed for:

🔥 Multi-modal video fusion
🌙 Low-light / degraded video restoration
📹 Spatio-temporal modeling research

🎥 Scenario Schematic

Visualization of representative scenarios in M3SVD. The dataset contains 220 temporally synchronized infrared-visible (IR-VI) video pairs with 153,797 aligned frames in total, captured at a resolution of 640×480 and 30 FPS.

🏗 Data Processing Workflow

Dataset Comparison (vs. prior works)

🎞 Video Demo

Example sequences (GIF previews):

📦 Dataset Availability

Current release: Test split
Full dataset access: Please contact
linfeng0419@gmail.com

We are open to academic collaboration and research usage.

🔗 Related Work Using M3SVD

VideoFusion (CVPR 2026)

Spatio-temporal collaborative network for multi-modal video fusion and restoration. [Paper] [Code]

📝 Citation

If you use M3SVD in your research, please cite:

@inproceedings{Tang2026VideoFusion,
  title     = {VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion and Restoration},
  author    = {Tang, Linfeng and Wang, Yeda and Gong, Meiqi and Li, Zizhuo and Deng, Yuxin and Yi, Xunpeng and Li, Chunyu and Zhang, Hao and Xu, Han and Ma, Jiayi},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Video		Video
assets		assets
LICENSE		LICENSE
README.md		README.md
video2img.py		video2img.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M3SVD: Multi-Modal Multi-Scene Video Dataset

📦 Notes on Provided Formats (Images vs. Videos)

Baidu Netdisk Images: https://pan.baidu.com/s/1g8jixAr39n06JWPwrBE6lQ?pwd=M2VD

Baidu Netdisk Videos: https://pan.baidu.com/s/1z_kMLxYejPvt_17SNGlOTA?pwd=M2VD

GoogleDrive Videos: https://drive.google.com/file/d/1bRoNhQBzWtj0y8CMGdXQvjbQCGAVddPX/view?usp=sharing

🔁 Convert Videos to Frame Sequences (Recommended)

🧾 Folder Meaning

✨ News

📖 Introduction

🎥 Scenario Schematic

🏗 Data Processing Workflow

Dataset Comparison (vs. prior works)

🎞 Video Demo

📦 Dataset Availability

🔗 Related Work Using M3SVD

VideoFusion (CVPR 2026)

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

M3SVD: Multi-Modal Multi-Scene Video Dataset

📦 Notes on Provided Formats (Images vs. Videos)

Baidu Netdisk Images: https://pan.baidu.com/s/1g8jixAr39n06JWPwrBE6lQ?pwd=M2VD

Baidu Netdisk Videos: https://pan.baidu.com/s/1z_kMLxYejPvt_17SNGlOTA?pwd=M2VD

GoogleDrive Videos: https://drive.google.com/file/d/1bRoNhQBzWtj0y8CMGdXQvjbQCGAVddPX/view?usp=sharing

🔁 Convert Videos to Frame Sequences (Recommended)

🧾 Folder Meaning

✨ News

📖 Introduction

🎥 Scenario Schematic

🏗 Data Processing Workflow

Dataset Comparison (vs. prior works)

🎞 Video Demo

📦 Dataset Availability

🔗 Related Work Using M3SVD

VideoFusion (CVPR 2026)

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages