This repository hosts the code for our paper, Frankentext: Stitching random text fragments into long-form narratives.
π§ββοΈ Frankentext is a new type of long-form narratives produced by LLMs under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from human writings. To produce Frankentexts, we instruct the model to produce a draft by selecting and combining human-written passages, then iteratively revise the draft while maintaining a user-specified copy ratio.
- [2025-05-23]: Dataset and prompts for Frankentext are now available! Pipeline code is coming soon...
(Coming soon...)
.
βββ README.md
βββ assets
βββ data
βββ outputs
βββ inputs
βββ prompts
βββ scripts
βββ pipeline
βββ eval
data:inputscontains input data for each experiment.outputscontains outputs from each experiment mentioned in the paper. Each subfolder represents outputs from an experiment.
scriptscontains code to obtain and automatically evaluate Frankentexts:evalcontains code to obtain metrics.pipelinecontains code to construct Frankentexts with models tested in the paper.
promptscontains all prompts used in the paper.llm_judgecontains prompts to obtain coherence and relevance judgments with LLM.pipelinecontains prompt to generate and edit Frankentexts.
@misc{pham2025frankentextstitchingrandomtext,
title={Frankentext: Stitching random text fragments into long-form narratives},
author={Chau Minh Pham and Jenna Russell and Dzung Pham and Mohit Iyyer},
year={2025},
eprint={2505.18128},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.18128},
}
