Side Effects of Erasing Concepts from Diffusion Models

📄 Paper

Side Effects of Erasing Concepts from Diffusion Models
Findings of EMNLP 2025
Authors: Shaswati Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale

🔍 Abstract

Concerns about text-to-image (T2I) generative models infringing on privacy, copyright, and safety have led to the development of concept erasure techniques (CETs). The goal of an effective CET is to prohibit the generation of undesired "target" concepts specified by the user, while preserving the ability to synthesize high-quality images of other concepts. In this work, we demonstrate that concept erasure has side effects and CETs can be easily circumvented. For a comprehensive measurement of the robustness of CETs, we present the Side Effect Evaluation (SEE) benchmark that consists of hierarchical and compositional prompts describing objects and their attributes. The dataset and an automated evaluation pipeline quantify side effects of CETs across three aspects: impact on neighboring concepts, evasion of targets, and attribute leakage. Our experiments reveal that CETs can be circumvented by using superclass-subclass hierarchy, semantically similar prompts, and compositional variants of the target. We show that CETs suffer from attribute leakage and a counterintuitive phenomenon of attention concentration or dispersal. We release our benchmark and evaluation tools to aid future work on robust concept erasure.

Side Effects of Erasing Concepts from Diffusion Models

Figure 1. We benchmark unintended side effects of CETs. Each column shows the concept to be erased, the text prompt, and the images generated before (top) and after (bottom) erasure. The tree shows the sub-graph in the hierarchy (parents and children) corresponding to the erased concept. We highlight the side effects: (1) Impact on neighboring concepts: erasing “car” does not erase the child concept “red car,” while erasing “red car” impacts the neighboring concept red bus. (2) Evasion of targets: erasing superclass “vehicle” can be circumvented through the subclasses (e.g., “car”) and corresponding attribute-based children (e.g., “red car”). (3) Attribute leakage: erasing “couch” leads to unintended leakage of the target attribute “blue” to the unrelated concept “potted plant”.

⚙️ Installation

git clone https://github.com/shaswati1/see.git
cd see

conda env create -f environment.yml
conda activate see

🧬 Semantic Concept Hierarchy

Component	Size
Superclasses	11
Base objects	79
Attribute groups	3
Total compositional variants	5,056

Example:

🚗 vehicle → car → red car → small red wooden car → …

Build semantic hierarchy

python semantic_hierarchy.py

Create evaluation prompts for a target concept

python create_eval_set.py \
  --hierarchy semantic_hierarchy.json \
  --erase_concept vehicle \
  --eval_dim neighboring_impact evasion attr_leakage

Generate images using an edited diffusion model

python generate_eval_images.py \
  --sd_version <sd_version> \
  --model_path <model_path> \
  --erase_concept <target_concept> \
  --eval_set <eval_set_json> \
  --eval_dim <neighboring_impact | evasion | attr_leakage | multiple> \
  --num_samples <num_samples_per_prompt> \
  --device <device> \
  --out_dir <output_dir>

🧪 Evaluate generated images

python eval_images.py \
  --image_dir <image_dir> \
  --output_dir <output_dir> \
  --model <cet_name> \
  --target <target_concept> \
  --eval_dims <neighboring_impact | evasion | attr_leakage | multiple> \
  --verifiers <CLIP BLIP QWEN2.5VL Florence-2-base> \
  --device <device>

📊 Compute Accuracy

python calculate_accuracy.py \
  --predictions_dir <predictions_dir> \
  --metadata_file <metadata_jsonl> \
  --model <cet_name> \
  --target <target_concept> \
  --eval_dims <neighboring_impact | evasion | attr_leakage | multiple> \
  --verifiers <CLIP BLIP QWEN2.5VL Florence-2-base> \
  --output_file <output_json>

🔗 Benchmarked CETs

🧪 Evaluation Settings

We evaluate each Concept Erasure Technique (CET) using the best-performing hyperparameters, training settings, and edit configurations recommended in their original papers or official repositories.

🙌 Citation

@inproceedings{saha2025side,
  title={Side Effects of Erasing Concepts from Diffusion Models},
  author={Saha, Shaswati and Saha, Sourajit and Gaur, Manas and Gokhale, Tejas},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2025},
  pages={14991--15007},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Side Effects of Erasing Concepts from Diffusion Models

📄 Paper

🔍 Abstract

⚙️ Installation

🧬 Semantic Concept Hierarchy

Build semantic hierarchy

Create evaluation prompts for a target concept

Generate images using an edited diffusion model

🧪 Evaluate generated images

📊 Compute Accuracy

🔗 Benchmarked CETs

🧪 Evaluation Settings

🙌 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
README.md		README.md
calculate_accuracy.py		calculate_accuracy.py
create_eval_set.py		create_eval_set.py
environment.yml		environment.yml
eval_images.py		eval_images.py
generate_eval_images.py		generate_eval_images.py
semantic_hierarchy.py		semantic_hierarchy.py
template.py		template.py

Folders and files

Latest commit

History

Repository files navigation

Side Effects of Erasing Concepts from Diffusion Models

📄 Paper

🔍 Abstract

⚙️ Installation

🧬 Semantic Concept Hierarchy

Build semantic hierarchy

Create evaluation prompts for a target concept

Generate images using an edited diffusion model

🧪 Evaluate generated images

📊 Compute Accuracy

🔗 Benchmarked CETs

🧪 Evaluation Settings

🙌 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages