Side Effects of Erasing Concepts from Diffusion Models
Findings of EMNLP 2025
Authors: Shaswati Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale
Concerns about text-to-image (T2I) generative models infringing on privacy, copyright, and safety have led to the development of concept erasure techniques (CETs). The goal of an effective CET is to prohibit the generation of undesired "target" concepts specified by the user, while preserving the ability to synthesize high-quality images of other concepts. In this work, we demonstrate that concept erasure has side effects and CETs can be easily circumvented. For a comprehensive measurement of the robustness of CETs, we present the Side Effect Evaluation (SEE) benchmark that consists of hierarchical and compositional prompts describing objects and their attributes. The dataset and an automated evaluation pipeline quantify side effects of CETs across three aspects: impact on neighboring concepts, evasion of targets, and attribute leakage. Our experiments reveal that CETs can be circumvented by using superclass-subclass hierarchy, semantically similar prompts, and compositional variants of the target. We show that CETs suffer from attribute leakage and a counterintuitive phenomenon of attention concentration or dispersal. We release our benchmark and evaluation tools to aid future work on robust concept erasure.
Figure 1. We benchmark unintended side effects of CETs. Each column shows the concept to be erased, the text prompt, and the images generated before (top) and after (bottom) erasure. The tree shows the sub-graph in the hierarchy (parents and children) corresponding to the erased concept. We highlight the side effects: (1) Impact on neighboring concepts: erasing “car” does not erase the child concept “red car,” while erasing “red car” impacts the neighboring concept red bus. (2) Evasion of targets: erasing superclass “vehicle” can be circumvented through the subclasses (e.g., “car”) and corresponding attribute-based children (e.g., “red car”). (3) Attribute leakage: erasing “couch” leads to unintended leakage of the target attribute “blue” to the unrelated concept “potted plant”.
git clone https://github.com/shaswati1/see.git
cd see
conda env create -f environment.yml
conda activate see| Component | Size |
|---|---|
| Superclasses | 11 |
| Base objects | 79 |
| Attribute groups | 3 |
| Total compositional variants | 5,056 |
Example:
🚗 vehicle → car → red car → small red wooden car → …
python semantic_hierarchy.pypython create_eval_set.py \
--hierarchy semantic_hierarchy.json \
--erase_concept vehicle \
--eval_dim neighboring_impact evasion attr_leakagepython generate_eval_images.py \
--sd_version <sd_version> \
--model_path <model_path> \
--erase_concept <target_concept> \
--eval_set <eval_set_json> \
--eval_dim <neighboring_impact | evasion | attr_leakage | multiple> \
--num_samples <num_samples_per_prompt> \
--device <device> \
--out_dir <output_dir>python eval_images.py \
--image_dir <image_dir> \
--output_dir <output_dir> \
--model <cet_name> \
--target <target_concept> \
--eval_dims <neighboring_impact | evasion | attr_leakage | multiple> \
--verifiers <CLIP BLIP QWEN2.5VL Florence-2-base> \
--device <device>
python calculate_accuracy.py \
--predictions_dir <predictions_dir> \
--metadata_file <metadata_jsonl> \
--model <cet_name> \
--target <target_concept> \
--eval_dims <neighboring_impact | evasion | attr_leakage | multiple> \
--verifiers <CLIP BLIP QWEN2.5VL Florence-2-base> \
--output_file <output_json>
We evaluate each Concept Erasure Technique (CET) using the best-performing hyperparameters, training settings, and edit configurations recommended in their original papers or official repositories.
@inproceedings{saha2025side,
title={Side Effects of Erasing Concepts from Diffusion Models},
author={Saha, Shaswati and Saha, Sourajit and Gaur, Manas and Gokhale, Tejas},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2025},
pages={14991--15007},
year={2025}
}