Xiaoyu Wu, Yifei Pang, Terrance Liu, Zhiwei Steven Wu
NeurIPS 2025 (arXiv 2505.24379)
This repository provides the implementation of our algorithm for extracting unlearned data from large language models (LLMs) using guidance-based methods. The code is built primarily upon the TOFU repository and includes data from MUSE.
The core implementation can be found in MUSE/evaluate_util.py, particularly the contrasting_generation function.
We follow most of the dependencies used in TOFU. To set up the environment:
conda create -n tofu python=3.10
conda activate tofu
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install flash-attn --no-build-isolationcd MUSE
bash finetune_phi_all_iter_v2.shThis step takes approximately 12 hours on 2×A100 GPUs.
bash eval_idea_10_v2.shThis will measure the memorization of the forgetting set and save the results to the corresponding checkpoint directory.
python read_final_res.pyThis script outputs a comparison between the pre- and post-unlearning models, along with the performance of our extraction method.
If you find our work valuable and utilize it, we kindly request that you cite our paper.
@article{wu2025breaking,
title={Breaking the Gold Standard: Extracting Forgotten Data under Exact Unlearning in Large Language Models},
author={Wu, Xiaoyu and Pang, Yifei and Liu, Terrance and Wu, Zhiwei Steven},
journal={arXiv preprint arXiv:2505.24379},
year={2025}
}