GitHub - imcjx/KTB: [CVPR 2025] This is the official implementation of Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation

Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation

🌟CVPR 2025 (Oral Presentation)🌟

Jiaxin Cai, Jingze Su, Qi Li, Wenjie Yang, Shu Wang, Tiesong Zhao, Shengfeng He, Wenxi Liu

Abstract

Multimodal semantic segmentation is a critical challenge in computer vision, with early methods suffering from high computational costs and limited transferability due to full fine-tuning of RGB-based pre-trained parameters. Recent studies, while leveraging additional modalities as supplementary prompts to RGB, still predominantly rely on RGB, which restricts the full potential of other modalities. To address these issues, we propose a novel symmetric parameter-efficient fine-tuning framework for multimodal segmentation, featuring with a modality-aware prompting and adaptation scheme, to simultaneously adapt the capabilities of a powerful pre-trained model to both RGB and X modalities. Furthermore, prevalent approaches use the global cross-modality correlations of attention mechanism for modality fusion, which inadvertently introduces noise across modalities. To mitigate this noise, we propose a dynamic sparse cross-modality fusion module to facilitate effective and efficient cross-modality fusion. To further strengthen the above two modules, we propose a training strategy that leverages accurately predicted dual-modality results to self-teach the single-modality outcomes. In comprehensive experiments, we demonstrate that our method outperforms previous state-of-the-art approaches across six multimodal segmentation scenarios with minimal computation cost.

For more details, please check our paper.

For questions regarding the code or paper, the most direct way to reach me is via email at jiaxincai528@163.com.

Updates

08/2025, init repository.
08/2025, release model weights. Download from GoogleDrive.

Environment

pip install -r requirements.txt

Data preparation

Prepare six datasets:

NYU Depth V2, for RGB-Depth semantic segmentation.
SUN-RGBD, for RGB-Depth semantic segmentation.
MFNet, for RGB-Thermal semantic segmentation.
PST900, for RGB-Thermal semantic segmentation.
MCubeS, for multimodal material segmentation with RGB-A-D-N modalities.
DELIVER, for RGB-Depth-Event-LiDAR semantic segmentation.

Then, all datasets are structured as:

data/
├── NYUDepthv2
│   ├── RGB
│   ├── HHA
│   └── Label
├── SUN-RGBD
│   ├── Depth
│   ├── labels
│   ├── RGB
│   ├── test.txt
│   └── train.txt
├── MFNet
│   ├── rgb
│   ├── ther
│   └── labels
├── PST900
│   ├── train
│   └── test
├── MCubeS
│   ├── polL_color
│   ├── polL_aolp
│   ├── polL_dolp
│   ├── NIR_warped
│   └── SS
├── DELIVER
│   ├── img
│   ├── hha
│   ├── event
│   ├── lidar
│   └── semantic

Following CMNext, for the NYU Depth Dataset, we utilize the HHA format generated from depth images. For the SUNRGBD dataset, we employ the standard depth format instead.

Model Zoo

NYU Depth V2

Model-Modal	mIoU	weight
Ours-RGB-D (Swin-B)	59.0	GoogleDrive
Ours-RGB-D (Swin-L)	59.9	GoogleDrive

SUN-RGBD

Model-Modal	mIoU	weight
Ours-RGB-D (Swin-B)	53.7	GoogleDrive
Ours-RGB-D (Swin-L)	55.0	GoogleDrive

MFNet

Model-Modal	mIoU	weight
Ours-RGB-T (Swin-B)	59.9	GoogleDrive
Ours-RGB-T (Swin-L)	59.2	GoogleDrive

PST900

Model-Modal	mIoU	weight
Ours-RGB-T (Swin-B)	87.6	GoogleDrive
Ours-RGB-T (Swin-L)	88.7	GoogleDrive

MCubeS

Model-Modal	mIoU	weight
Ours-RGB-N (Swin-B)	53.8	GoogleDrive
Ours-RGB-D (Swin-B)	54.5	GoogleDrive
Ours-RGB-A (Swin-B)	53.7	GoogleDrive

DELIVER

Model-Modal	mIoU	weight
Ours-RGB-E (Swin-B)	58.4	GoogleDrive

Training

Before training, please download pre-trained Swin-Transformer, and modify the relevant configuration settings accordingly:

In semseg/models/backbones/swin.py (line 1108)
# For Swin-B
checkpoint_file = '/xxxx/xxxx/swin_base_patch4_window12_384_22k_20220317-e5c09f74.pth'
# For Swin-L    
# checkpoint_file = '/xxxx/xxxx/swin_large_patch4_window12_384_22k_20220412-6580f57d.pth'

To train model, please update the appropriate configuration file in configs/ with appropriate dataset paths. Then run as follows:

python train_mm.py --cfg configs/nyu_rgbd.yaml

Evaluation

To evaluate models, please download respective model weights (GoogleDrive). Then update the appropriate configuration file in configs/ with appropriate dataset paths, and run:

python val_mm.py --cfg configs/nyu_rgbd.yaml

Acknowledgements

Our codebase is based on the following Github repositories. Thanks to the following public repositories:

License

This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.

Citations

If our method proves to be of any assistance, please consider citing:

@inproceedings{cai2025keep,
  title={Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+ X Semantic Segmentation},
  author={Cai, Jiaxin and Su, Jingze and Li, Qi and Yang, Wenjie and Wang, Shu and Zhao, Tiesong and He, Shengfeng and Liu, Wenxi},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={10587--10598},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
semseg		semseg
tools		tools
LICENSE		LICENSE
README.md		README.md
img.png		img.png
infer_mm.py		infer_mm.py
requirements.txt		requirements.txt
train_mm.py		train_mm.py
val_mm.py		val_mm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation

🌟CVPR 2025 (Oral Presentation)🌟

Abstract

Updates

Environment

Data preparation

Model Zoo

NYU Depth V2

SUN-RGBD

MFNet

PST900

MCubeS

DELIVER

Training

Evaluation

Acknowledgements

License

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation

🌟CVPR 2025 (Oral Presentation)🌟

Abstract

Updates

Environment

Data preparation

Model Zoo

NYU Depth V2

SUN-RGBD

MFNet

PST900

MCubeS

DELIVER

Training

Evaluation

Acknowledgements

License

Citations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages