Skip to content

Code for AAAI 2025 paper "Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models"

Notifications You must be signed in to change notification settings

XDJPH/Backdoor-Token-Unlearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Backdoor Token Unlearning (BTU)

Code for AAAI 2025 Paper
"Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models"
📄 arXiv:2501.03272
📘 AAAI Proceedings


📝 Overview

Backdoor Token Unlearning (BTU) is a novel anti-backdoor learning method designed to train clean language models from poisoned datasets.
The method identifies and neutralizes backdoor triggers by unlearning their influence in token representations, achieving robust defense with minimal performance degradation on clean tasks.


📂 Dataset

The AGNews dataset used in our experiments is not included in this repository due to its size.
Please download the dataset from the OpenBackdoor repository by THUNLP, which includes the same data splits used in our paper.


⚙️ Installation

Ensure you're using Python 3.9. Then install the required dependencies:

pip install -r requirements.txt

The requirements.txt file contains all necessary libraries and specific version constraints for reproducibility.


⚙️ Configuration

Customize your training setup by modifying the config.json file. You can specify:

  • Dataset paths (tasks and datasets)
  • Model paths (pretrained checkpoints)
  • Training hyperparameters, such as:
    • learning_rate
    • epochs
    • batch_size
  • Unlearning parameters:
    • Threshold

Ensure all paths and settings reflect your actual environment before running the script.


🚀 Usage

To start the BTU pipeline, simply run:

python BTU.py

Intermediate logs, model checkpoints, and evaluation results will be saved to the specified output directory in your configuration.


📈 Results Summary

Our BTU method demonstrates:

  • Our method drastically reduces the backdoor attack success rate (ASR) with only a marginal loss in clean task accuracy.

For more results, refer to Table 1 in the paper.


📖 Citation

If you use this codebase or method in your research, please cite the following work:

@inproceedings{jiang2025backdoor,
  title={Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models},
  author={Jiang, Peihai and Lyu, Xixiang and Li, Yige and Ma, Jing},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={23},
  pages={24285--24293},
  year={2025}
}

🙏 Acknowledgments


📬 Contact

For questions or collaborations, please reach out to the authors via the contact information provided in the paper.


About

Code for AAAI 2025 paper "Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages