This Dataset is for the paper "Learning to Rewrite: Generalized LLM-Generated Text Detection" (arXiv)
The repository is organized into folders, each corresponding to one of 21 domains. Within each folder, you will find the following files:
human.json: Contains human-generated text.{MODEL}.json: Contains machine-generated text from the specified model.
For information on data source, please refer to Appendix A.1 of the paper.
if you find this dataset useful, please cite:
@article{hao2024learning,
title={Learning to Rewrite: Generalized LLM-Generated Text Detection},
author={Hao, Wei and Li, Ran and Zhao, Weiliang and Yang, Junfeng and Mao, Chengzhi},
journal={arXiv preprint arXiv:2408.04237},
year={2024}
}