Pinned
LLMs reveal secrets when they’re asked to write stories.
We told LLMs not to reveal the secret words we gave them, then asked them to write stories. The secret x word never appears literally. But another model can identify it from the story up to 79% of the time.









