LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models
The poster of the paper is available in the assets directory.
uv venv LNE-Blocking --python 3.11.13
sourve LNE-Blocking/bin/activate
uv pip install -r requirements.txtRefer https://github.com/YihongDong/CDD-TED4LLMs?tab=readme-ov-file#contaminated-models to download the simulated contaminated LoRA weights as described here. File structure will be as follows:
pretrain_outputs
├── CodeGen6B
├── CodeLlama7B
├── CodeLlama7B_50k
├── CodeLlama7B_lr1e_3
├── CodeLlama7B_lr4e_5
├── gpt-3.5
├── GroundTruth_Probs_CodeGen6B
├── GroundTruth_Probs_CodeLlama7B
├── GroundTruth_Probs_Variants_CodeGen6B
├── GroundTruth_Probs_Variants_CodeLlama7B
├── Llama2
├── Outputs_CodeGen6B
├── Outputs_CodeLlama7B
├── Outputs_CodeLlama7B_lr1e_3
├── Outputs_CodeLlama7B_lr4e_5
├── Outputs_Llama2
├── Outputs_Variants_CodeGen6B
├── Outputs_Variants_CodeLlama7B
├── Variants_CodeGen6B
└── Variants_CodeLlama7B
Then run
mkdir saves
ln -s pretrain_outputs/CodeLlama7B saves/bash batch_eval.shpython merge_script/makeyaml_codellama1k.py
python merge_script/batch_merge.py# !Need to configure exp dir in the script
bash batch_infer.sh
# !Need to configure exp dir in the script
bash batch_eval.sh
@misc{hou2025lneblocking,
title={LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models},
author={Ruijie Hou and Yueyang Jiao and Hanxu Hu and Yingming Li and Wai Lam and Huajian Zhang and Hongyuan Lu},
year={2025},
eprint={2509.15218},
archivePrefix={arXiv},
primaryClass={cs.CL}
}