Skip to content

Linfeng-Tang/M3SVD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

M3SVD: Multi-Modal Multi-Scene Video Dataset

📦 Notes on Provided Formats (Images vs. Videos)

  • Images are released in the native per-frame format that is directly used by the VideoFusion project (i.e., frame sequences under each clip folder), so you can plug them into the training/testing pipeline without any extra conversion.
  • Videos are additionally provided by packing each frame sequence into a single video file (e.g., .mp4) to reduce file count and avoid storage / hosting limitations (many platforms struggle with extremely large numbers of small image files).

🔁 Convert Videos to Frame Sequences (Recommended)

If you download the Videos version and need per-frame image sequences (the format directly used in VideoFusion), please use:

This script converts each .mp4 clip into an ordered frame sequence and restores the dataset layout for training/testing.

🧾 Folder Meaning

We provide both high-quality (clean/enhanced) data and degraded data for infrared and visible modalities:

  • infrared_Enhance: High-quality infrared (IR) frames (clean/enhanced version).
  • visible_Enhance: High-quality visible (VI) frames (clean/enhanced version).
  • infrared_noise: Degraded infrared (IR) frames with stripe noise (a typical IR sensor degradation).
  • visible_Blur: Degraded visible (VI) frames with blur (e.g., motion/defocus blur).

✨ News

  • [2026] Our paper “VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion” has been accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026! [Paper] [Code]

  • [2025] M3SVD dataset is officially released.


📖 Introduction

M3SVD (Multi-Modal Multi-Scene Video Dataset) is a large-scale infrared-visible (IR-VI) video dataset designed for:

  • 🔥 Multi-modal video fusion
  • 🌙 Low-light / degraded video restoration
  • 📹 Spatio-temporal modeling research

🎥 Scenario Schematic

Visualization of representative scenarios in M3SVD. The dataset contains 220 temporally synchronized infrared-visible (IR-VI) video pairs with 153,797 aligned frames in total, captured at a resolution of 640×480 and 30 FPS.


🏗 Data Processing Workflow

Dataset Comparison (vs. prior works)


🎞 Video Demo

Example sequences (GIF previews):

Demo GIF

Demo GIF

Demo GIF

Demo GIF


📦 Dataset Availability

We are open to academic collaboration and research usage.


🔗 Related Work Using M3SVD

VideoFusion (CVPR 2026)

Spatio-temporal collaborative network for multi-modal video fusion and restoration. [Paper] [Code]


📝 Citation

If you use M3SVD in your research, please cite:

@inproceedings{Tang2026VideoFusion,
  title     = {VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion and Restoration},
  author    = {Tang, Linfeng and Wang, Yeda and Gong, Meiqi and Li, Zizhuo and Deng, Yuxin and Yi, Xunpeng and Li, Chunyu and Zhang, Hao and Xu, Han and Ma, Jiayi},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

About

M3SVD:Multi-Modal Multi-Scene Video Dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages