The first systematic study of data contamination detection in RL post-training scenarios.
- 📖 Introduction
- ✨ Getting Started
- 🔧 Usage
- 🎯 Main Results
- 🎈 Citation
Data contamination poses a significant threat to the reliable evaluation of Large Language Models (LLMs). This issue arises when benchmark samples may inadvertently appear in training sets, compromising the validity of reported performance. While detection methods have been developed for the pre-training and Supervised Fine-Tuning stages, a critical research gap exists for the increasingly significant phase of Reinforcement Learning (RL) post-training.
As RL post-training becomes pivotal for advancing LLM reasoning, the absence of specialized contamination detection methods in this paradigm presents a critical vulnerability. To address this, we conduct the first systematic study of data detection within RL post-training scenario and propose Self-Critique.
- First RL Contamination Study: Systematic investigation of data contamination detection in RL post-training scenarios
- Self-Critique Method: Novel detection approach that probes for policy collapse through entropy analysis
- RL-MIA Benchmark: Comprehensive benchmark for evaluating contamination detection methods in RL settings
You can install the dependencies by running the following commands:
# Create conda environment
conda create -n self_critique python=3.10
conda activate self_critique
cd RL_Contaminate/verl
pip install -r requirements.txt
pip install -e .
cd verl
pip install -e .This repository includes:
detectors/: Implementation of various contamination detection methods including our proposed Self-Critiquebenchmarks/: RL-MIA benchmark datasets for different tasks (GSM8k, AIME, SAT, etc.)RL_Contaminate/: RL training framework and scripts for generating contaminated modelseval_scripts/: Evaluation scripts for testing detection methods across different models and scenarios
First, we need to train models to simulate RL-stage data contamination, for example:
cd RL_Contaminate
bash ./train_scripts/aime_aime25/train_grpo_RLMIA_qwen_math.shThen we can use the corresponding test scripts to evaluate the effectiveness of different detection methods:
cd ..
bash ./eval_scripts/Qwen2.5-Math-7B/run_full_workflow_qwen_math_aime.sh
bash ./eval_scripts/Qwen2.5-Math-7B/run_full_workflow_qwen_math_aime25.shThe same process applies to other models and datasets;
For the DUAL-STAGE CONTAMINATION IN PRE-TRAINING & RL experiments, follow the same workflow:
cd RL_Contaminate
bash ./train_scripts/gsm8k/train_ppo_qwen_0.5b_gsm8k.shThen
cd ..
bash ./eval_scripts/Lower_Pretraining/run_Qwen2.5-0.5B-Instruct.shOur Self-Critique method significantly outperforms existing detection methods:
If you find our data or code useful, please kindly cite our paper.
@misc{rl-data-contamination,
title={Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models},
author={Yongding Tao and Tian Wang and Yihong Dong and Huanyu Liu and Kechi Zhang and Xiaolong Hu and Ge Li},
year={2025},
eprint={2510.09259},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.09259},
}

