Skip to content

zhenlong-liu/Provable_Training_Data_Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Provable Training Data Identification for Large Language Models

This repository is the official implementation for the paper: Provable Training Data Identification for Large Language Models (ICML 2026)

Quick Start

1. Installation

conda env create -f environment.yaml

2. Evaluation
To evaluate our method, run:

./run_eval.sh

Citation

If you find this useful in your research, please consider citing:


@inproceedings{liu2026provable,
  title={Provable Training Data Identification for Large Language Models},
  author={Liu, Zhenlong and Zeng, Hao and Huang, Weiran and Wei, Hongxin},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026},
  url={https://arxiv.org/abs/2510.09717},
}

Acknowledgements

Our code is inspired by Min-K% Prob. We thank the authors for releasing their code.

About

[ICML'26] Provable Training Data Identification for Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors