Towards Generative Understanding: Incremental Few-shot Semantic Segmentation with Diffusion Models

1Nanjing University of Posts and Telecommunications
2Singapore University of Technology and Design
3University of California at Riverside, CA, USA

*Indicates corresponding author
MY ALT TEXT

We propose iFSS-Diff, the first framework to introduce diffusion models into incremental few-shot semantic segmentation.

Abstract

Incremental Few-shot Semantic Segmentation (iFSS) aims to learn novel classes with limited samples while preserving segmentation capability for base classes, addressing catastrophic forgetting. Existing methods, relying on knowledge distillation and background learning, still suffer from feature drift and poor generalization. To overcome these challenges, we propose a novel diffusion-based generative framework for iFSS. By mapping binary masks to three-channel representations and optimizing class-specific semantic embeddings, our method enhances foreground-background distinction and prevents feature interference. A lightweight post-processing module refines segmentation by converting generated images into binary masks. Leveraging the prior knowledge of diffusion models, we unify the learning of base and novel classes, eliminating complex training strategies and improving adaptability. Experiments on PASCAL-5i and COCO-20i datasets show our framework achieves state-of-the-art performance with minimal data. Additionally, our framework exhibits strong generalization in cross-domain few-shot segmentation (CD-FSS) benchmarks.

iFSS-Diff framework

Framework illustration

iFSS-Diff converts binary masks into three-channel RGB masks to align with the input paradigm of latent diffusion models, using them as supervision signals to obtain optimized class-specific semantic embeddings. This approach intrinsically decouples the embeddings of base and novel classes, effectively preventing catastrophic forgetting while enhancing the model's generalization capability.

Visual Results of PASCAL-5i

Framework illustration

Qualitative results of incremental few-shot semantic segmentation on PASCAL-5i dataset (1-shot), where the baseline represents the results without color and optimized background embeddings. For the first image, ”person” is the base class and ”bus” is the novel class.

Visual Results on CD-FSS

Framework illustration

Visualization results of cross-domain semantic segmentation on 1-shot setting.

BibTeX

BibTex Code Here