- Our paper is accpeted by ICML 2025! Please check out the paper here.
pip install -e .The code for searching algorithm is modified from https://github.com/llm-attacks/llm-attacks
# searching the unnatural langauge version of the first row in `vermouthdky/Unnatural_LIMA`
cd llm-attacks/experiments/launch_scripts
bash run_unnatural_lima.sh 0 1 # specify data offset [0, 1000)unnatural_language is licensed under the terms of the MIT license. See
LICENSE for more details.
| Dataset | Description | 🤗 Download |
|---|---|---|
| Unnatural SynContextQA | Synthetic Datasets For Unnatural Language Question Answering | Link |
| Unnatural SimGSM8K | A subset of GSM8K For Unnatural Language Question Answering | Link |
| Unnatural LIMA | An Unnatural Version of LIMA for Instruction Tuning | Link |
| Model | SFT Dataset | 🤗 Download |
|---|---|---|
| Gemma-2-9B | Unnatural LIMA | Link |
| Gemma-2-9B | Natural LIMA | Link |
| Llama-3-8B | Unnatural LIMA | Link |
| Llama-3-8B | Natural LIMA | Link |
| Llama-3-70B | Unnatural LIMA | Link |
| Llama-3-70B | Natural LIMA | Link |
Unnatural Language Question Answering
An example from Unnatural SimGSM8K QA is shown as follows (left). The eval results are show in the right figure.
Unnatural Instruction Tuning
We build an unnatural version of LIMA and tune models with various size using standard sft. The eval results on Alpaca Eval 2.0 LC and MixEval are show as follows. Tuned model weights are shown in Unnatural Models.
If you find our repo useful, please consider citing
@misc{duan2025unnaturallanguagesbugsfeatures,
title={Unnatural Languages Are Not Bugs but Features for LLMs},
author={Keyu Duan and Yiran Zhao and Zhili Feng and Jinjie Ni and Tianyu Pang and Qian Liu and Tianle Cai and Longxu Dou and Kenji Kawaguchi and Anirudh Goyal and J. Zico Kolter and Michael Qizhe Shieh},
year={2025},
eprint={2503.01926},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.01926},
}



