Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

The first systematic study of data contamination detection in RL post-training scenarios.

📚 Overview

📖 Introduction
✨ Getting Started
🔧 Usage
🎯 Main Results
🎈 Citation

📖Introduction

Data contamination poses a significant threat to the reliable evaluation of Large Language Models (LLMs). This issue arises when benchmark samples may inadvertently appear in training sets, compromising the validity of reported performance. While detection methods have been developed for the pre-training and Supervised Fine-Tuning stages, a critical research gap exists for the increasingly significant phase of Reinforcement Learning (RL) post-training.

As RL post-training becomes pivotal for advancing LLM reasoning, the absence of specialized contamination detection methods in this paradigm presents a critical vulnerability. To address this, we conduct the first systematic study of data detection within RL post-training scenario and propose Self-Critique.

Key Highlights:

First RL Contamination Study: Systematic investigation of data contamination detection in RL post-training scenarios
Self-Critique Method: Novel detection approach that probes for policy collapse through entropy analysis
RL-MIA Benchmark: Comprehensive benchmark for evaluating contamination detection methods in RL settings

✨Getting Started

Installation

You can install the dependencies by running the following commands:

# Create conda environment
conda create -n self_critique python=3.10
conda activate self_critique

cd RL_Contaminate/verl
pip install -r requirements.txt
pip install -e .

cd verl
pip install -e .

Repo Structure

This repository includes:

detectors/: Implementation of various contamination detection methods including our proposed Self-Critique
benchmarks/: RL-MIA benchmark datasets for different tasks (GSM8k, AIME, SAT, etc.)
RL_Contaminate/: RL training framework and scripts for generating contaminated models
eval_scripts/: Evaluation scripts for testing detection methods across different models and scenarios

🔧Usage

First, we need to train models to simulate RL-stage data contamination, for example:

cd RL_Contaminate
bash ./train_scripts/aime_aime25/train_grpo_RLMIA_qwen_math.sh

Then we can use the corresponding test scripts to evaluate the effectiveness of different detection methods:

cd ..
bash ./eval_scripts/Qwen2.5-Math-7B/run_full_workflow_qwen_math_aime.sh
bash ./eval_scripts/Qwen2.5-Math-7B/run_full_workflow_qwen_math_aime25.sh

The same process applies to other models and datasets;

For the DUAL-STAGE CONTAMINATION IN PRE-TRAINING & RL experiments, follow the same workflow:

cd RL_Contaminate
bash ./train_scripts/gsm8k/train_ppo_qwen_0.5b_gsm8k.sh

Then

cd ..
bash ./eval_scripts/Lower_Pretraining/run_Qwen2.5-0.5B-Instruct.sh

🎯 Main Results

Our Self-Critique method significantly outperforms existing detection methods:

🎈 Citation

If you find our data or code useful, please kindly cite our paper.

@misc{rl-data-contamination,
      title={Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models}, 
      author={Yongding Tao and Tian Wang and Yihong Dong and Huanyu Liu and Kechi Zhang and Xiaolong Hu and Ge Li},
      year={2025},
      eprint={2510.09259},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.09259}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
RL_Contaminate		RL_Contaminate
benchmarks		benchmarks
detectors		detectors
eval_scripts		eval_scripts
figs		figs
README.md		README.md
evaluate_all_methods.py		evaluate_all_methods.py
evaluate_all_methods_lower_pretraining.py		evaluate_all_methods_lower_pretraining.py
generate_full_data.py		generate_full_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

📚 Overview

📖Introduction

Key Highlights:

✨Getting Started

Installation

Repo Structure

🔧Usage

🎯 Main Results

🎈 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

📚 Overview

📖Introduction

Key Highlights:

✨Getting Started

Installation

Repo Structure

🔧Usage

🎯 Main Results

🎈 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages