Towards Generative Understanding: Incremental Few-shot Semantic Segmentation with Diffusion Models

¹Nanjing University of Posts and Telecommunications
²Singapore University of Technology and Design
³University of California at Riverside, CA, USA
^*Indicates corresponding author

Abstract

Incremental Few-shot Semantic Segmentation (iFSS) aims to learn novel classes with limited samples while preserving segmentation capability for base classes, addressing catastrophic forgetting. Existing methods, relying on knowledge distillation and background learning, still suffer from feature drift and poor generalization. To overcome these challenges, we propose a novel diffusion-based generative framework for iFSS. By mapping binary masks to three-channel representations and optimizing class-specific semantic embeddings, our method enhances foreground-background distinction and prevents feature interference. A lightweight post-processing module refines segmentation by converting generated images into binary masks. Leveraging the prior knowledge of diffusion models, we unify the learning of base and novel classes, eliminating complex training strategies and improving adaptability. Experiments on PASCAL-5i and COCO-20i datasets show our framework achieves state-of-the-art performance with minimal data. Additionally, our framework exhibits strong generalization in cross-domain few-shot segmentation (CD-FSS) benchmarks.

Towards Generative Understanding: Incremental Few-shot Semantic Segmentation with Diffusion Models

We propose iFSS-Diff, the first framework to introduce diffusion models into incremental few-shot semantic segmentation.

Abstract

iFSS-Diff framework

Visual Results of PASCAL-5ⁱ

Qualitative results of incremental few-shot semantic segmentation on PASCAL-5i dataset (1-shot), where the baseline represents the results without color and optimized background embeddings. For the first image, ”person” is the base class and ”bus” is the novel class.

Visual Results on CD-FSS

Visualization results of cross-domain semantic segmentation on 1-shot setting.

BibTeX

Towards Generative Understanding: Incremental Few-shot Semantic Segmentation with Diffusion Models

We propose iFSS-Diff, the first framework to introduce diffusion models into incremental few-shot semantic segmentation.

Abstract

iFSS-Diff framework

Visual Results of PASCAL-5i

Qualitative results of incremental few-shot semantic segmentation on PASCAL-5i dataset (1-shot), where the baseline represents the results without color and optimized background embeddings. For the first image, ”person” is the base class and ”bus” is the novel class.

Visual Results on CD-FSS

Visualization results of cross-domain semantic segmentation on 1-shot setting.

BibTeX

Visual Results of PASCAL-5ⁱ