Skip to content

shaswati1/see

Repository files navigation

Side Effects of Erasing Concepts from Diffusion Models

Paper License: MIT


📄 Paper

Side Effects of Erasing Concepts from Diffusion Models
Findings of EMNLP 2025
Authors: Shaswati Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale


🔍 Abstract

Concerns about text-to-image (T2I) generative models infringing on privacy, copyright, and safety have led to the development of concept erasure techniques (CETs). The goal of an effective CET is to prohibit the generation of undesired "target" concepts specified by the user, while preserving the ability to synthesize high-quality images of other concepts. In this work, we demonstrate that concept erasure has side effects and CETs can be easily circumvented. For a comprehensive measurement of the robustness of CETs, we present the Side Effect Evaluation (SEE) benchmark that consists of hierarchical and compositional prompts describing objects and their attributes. The dataset and an automated evaluation pipeline quantify side effects of CETs across three aspects: impact on neighboring concepts, evasion of targets, and attribute leakage. Our experiments reveal that CETs can be circumvented by using superclass-subclass hierarchy, semantically similar prompts, and compositional variants of the target. We show that CETs suffer from attribute leakage and a counterintuitive phenomenon of attention concentration or dispersal. We release our benchmark and evaluation tools to aid future work on robust concept erasure.

Side Effects of Erasing Concepts from Diffusion Models
Figure 1. We benchmark unintended side effects of CETs. Each column shows the concept to be erased, the text prompt, and the images generated before (top) and after (bottom) erasure. The tree shows the sub-graph in the hierarchy (parents and children) corresponding to the erased concept. We highlight the side effects: (1) Impact on neighboring concepts: erasing “car” does not erase the child concept “red car,” while erasing “red car” impacts the neighboring concept red bus. (2) Evasion of targets: erasing superclass “vehicle” can be circumvented through the subclasses (e.g., “car”) and corresponding attribute-based children (e.g., “red car”). (3) Attribute leakage: erasing “couch” leads to unintended leakage of the target attribute “blue” to the unrelated concept “potted plant”.

⚙️ Installation

git clone https://github.com/shaswati1/see.git
cd see

conda env create -f environment.yml
conda activate see

🧬 Semantic Concept Hierarchy

Component Size
Superclasses 11
Base objects 79
Attribute groups 3
Total compositional variants 5,056

Example:

🚗 vehicle → car → red car → small red wooden car → …


Build semantic hierarchy

python semantic_hierarchy.py

Create evaluation prompts for a target concept

python create_eval_set.py \
  --hierarchy semantic_hierarchy.json \
  --erase_concept vehicle \
  --eval_dim neighboring_impact evasion attr_leakage

Generate images using an edited diffusion model

python generate_eval_images.py \
  --sd_version <sd_version> \
  --model_path <model_path> \
  --erase_concept <target_concept> \
  --eval_set <eval_set_json> \
  --eval_dim <neighboring_impact | evasion | attr_leakage | multiple> \
  --num_samples <num_samples_per_prompt> \
  --device <device> \
  --out_dir <output_dir>

🧪 Evaluate generated images

python eval_images.py \
  --image_dir <image_dir> \
  --output_dir <output_dir> \
  --model <cet_name> \
  --target <target_concept> \
  --eval_dims <neighboring_impact | evasion | attr_leakage | multiple> \
  --verifiers <CLIP BLIP QWEN2.5VL Florence-2-base> \
  --device <device>


📊 Compute Accuracy

python calculate_accuracy.py \
  --predictions_dir <predictions_dir> \
  --metadata_file <metadata_jsonl> \
  --model <cet_name> \
  --target <target_concept> \
  --eval_dims <neighboring_impact | evasion | attr_leakage | multiple> \
  --verifiers <CLIP BLIP QWEN2.5VL Florence-2-base> \
  --output_file <output_json>

 

🔗 Benchmarked CETs


🧪 Evaluation Settings

We evaluate each Concept Erasure Technique (CET) using the best-performing hyperparameters, training settings, and edit configurations recommended in their original papers or official repositories.


🙌 Citation

@inproceedings{saha2025side,
  title={Side Effects of Erasing Concepts from Diffusion Models},
  author={Saha, Shaswati and Saha, Sourajit and Gaur, Manas and Gokhale, Tejas},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2025},
  pages={14991--15007},
  year={2025}
}

About

[EMNLP 2025, Findings] Side Effects of Erasing Concepts from Diffusion Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages