Code for AAAI 2025 Paper
"Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models"
📄 arXiv:2501.03272
📘 AAAI Proceedings
Backdoor Token Unlearning (BTU) is a novel anti-backdoor learning method designed to train clean language models from poisoned datasets.
The method identifies and neutralizes backdoor triggers by unlearning their influence in token representations, achieving robust defense with minimal performance degradation on clean tasks.
The AGNews dataset used in our experiments is not included in this repository due to its size.
Please download the dataset from the OpenBackdoor repository by THUNLP, which includes the same data splits used in our paper.
Ensure you're using Python 3.9. Then install the required dependencies:
pip install -r requirements.txtThe requirements.txt file contains all necessary libraries and specific version constraints for reproducibility.
Customize your training setup by modifying the config.json file. You can specify:
- Dataset paths (tasks and datasets)
- Model paths (pretrained checkpoints)
- Training hyperparameters, such as:
learning_rateepochsbatch_size
- Unlearning parameters:
- Threshold
Ensure all paths and settings reflect your actual environment before running the script.
To start the BTU pipeline, simply run:
python BTU.pyIntermediate logs, model checkpoints, and evaluation results will be saved to the specified output directory in your configuration.
Our BTU method demonstrates:
- Our method drastically reduces the backdoor attack success rate (ASR) with only a marginal loss in clean task accuracy.
For more results, refer to Table 1 in the paper.
If you use this codebase or method in your research, please cite the following work:
@inproceedings{jiang2025backdoor,
title={Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models},
author={Jiang, Peihai and Lyu, Xixiang and Li, Yige and Ma, Jing},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={23},
pages={24285--24293},
year={2025}
}- This project builds upon the OpenBackdoor framework by THUNLP.
- This project builds upon the https://github.com/lancopku/sos by LancoPKU.
For questions or collaborations, please reach out to the authors via the contact information provided in the paper.