Automated Detection of Pre-training Text in Black-box LLMs

🔍VeilProbe

This is the code for the paper.

We propose the VeilProbe framework to automatically detect LLMs' pre-training texts in a black-box setting without human intervention.

It consists of three modules: (a) The text-to-suffix sampling module generates text-to-suffix pairs; (b) The membership feature inference module infers membership features with a sequence-to-sequence model based on the above pairs; (c) The prototype-based classifier is trained based on the features from ground-truth samples, and then the pre-training and non-training prototypes are constructed. The first two modules prepare the membership feature inference stage for the texts to be detected, and the third one trains a membership classifier for detection.

💻 Reproduce our work

To reproduce results, you'll need to first run the following command to generate the prefix-suffix pairs and perturbed prefix-suffix pairs.

python sampling_close_sourced_LLM.py #obtaining generated pairs
python mappingmodel.py #obtaining mapping model

To obtain the perturbed version, run
python keyperturb_agg.py
under the key_perturb directory.

The trained mapping model and the perturbed mapping model can be obtained by mappingmodel.py above mentioned

Finally, run the following command.

python run_protonet_classifier.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
figure		figure
key_perturb		key_perturb
perturbed data		perturbed data
results		results
LICENSE		LICENSE
README.md		README.md
generated data.zip		generated data.zip
mappingmodel.py		mappingmodel.py
protonetIB.py		protonetIB.py
run_protonet_classifier.py		run_protonet_classifier.py
sampling_close_sourced_LLM.py		sampling_close_sourced_LLM.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Detection of Pre-training Text in Black-box LLMs

🔍VeilProbe

💻 Reproduce our work

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automated Detection of Pre-training Text in Black-box LLMs

🔍VeilProbe

💻 Reproduce our work

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages