Unnatural Language Are Not Bugs but Features for LLMs

🔥 Updates

Our paper is accpeted by ICML 2025! Please check out the paper here.

Installation

pip install -e .

Searching Unnatural Language via Reconstructive Objection

The code for searching algorithm is modified from https://github.com/llm-attacks/llm-attacks

# searching the unnatural langauge version of the first row in `vermouthdky/Unnatural_LIMA`
cd llm-attacks/experiments/launch_scripts
bash run_unnatural_lima.sh 0 1 # specify data offset [0, 1000)

unnatural_language is licensed under the terms of the MIT license. See LICENSE for more details.

Datasets and Models

Unnatural Datasets

Dataset	Description	🤗 Download
Unnatural SynContextQA	Synthetic Datasets For Unnatural Language Question Answering	Link
Unnatural SimGSM8K	A subset of GSM8K For Unnatural Language Question Answering	Link
Unnatural LIMA	An Unnatural Version of LIMA for Instruction Tuning	Link

Unnatural Models

Model	SFT Dataset	🤗 Download
Gemma-2-9B	Unnatural LIMA	Link
Gemma-2-9B	Natural LIMA	Link
Llama-3-8B	Unnatural LIMA	Link
Llama-3-8B	Natural LIMA	Link
Llama-3-70B	Unnatural LIMA	Link
Llama-3-70B	Natural LIMA	Link

Results

Unnatural Language Question Answering

An example from Unnatural SimGSM8K QA is shown as follows (left). The eval results are show in the right figure.

Unnatural Instruction Tuning

We build an unnatural version of LIMA and tune models with various size using standard sft. The eval results on Alpaca Eval 2.0 LC and MixEval are show as follows. Tuned model weights are shown in Unnatural Models.

Citation

If you find our repo useful, please consider citing

@misc{duan2025unnaturallanguagesbugsfeatures,
      title={Unnatural Languages Are Not Bugs but Features for LLMs}, 
      author={Keyu Duan and Yiran Zhao and Zhili Feng and Jinjie Ni and Tianyu Pang and Qian Liu and Tianle Cai and Longxu Dou and Kenji Kawaguchi and Anirudh Goyal and J. Zico Kolter and Michael Qizhe Shieh},
      year={2025},
      eprint={2503.01926},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.01926}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
experiments		experiments
instruction_tuning_experiments		instruction_tuning_experiments
unnatural_language		unnatural_language
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unnatural Language Are Not Bugs but Features for LLMs

🔥 Updates

Installation

Searching Unnatural Language via Reconstructive Objection

Datasets and Models

Unnatural Datasets

Unnatural Models

Results

Citation

About

Uh oh!

Releases

Packages

Languages

License

NUS-TRAIL/Unnatural_Language

Folders and files

Latest commit

History

Repository files navigation

Unnatural Language Are Not Bugs but Features for LLMs

🔥 Updates

Installation

Searching Unnatural Language via Reconstructive Objection

Datasets and Models

Unnatural Datasets

Unnatural Models

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages