llvm-autofix is an agentic harness of LLVM. Its current focus is automatic repair of LLVM bugs and systematic evaluation of agents' ability to resolve LLVM issues. Longer term, it aims to become an off‑the‑shelf agentic harness for all LLVM tasks that benefit from an agent. It includes:
- llvm tools: A collection of agent-friendly LLVM tool wrappers for agents.
- llvm skills: A collection of LLVM domain knowledge built into agent skills.
- llvm-bench (live): A continuously updated benchmark of recent LLVM issues, currently focused on middle-end bugs.
- llvm-autofix-mini: A minimal proof-of-concept agent targeted at fixing LLVM middle-end issues.
- 2026-03-20: We released
llvm-autofix, an agentic harness for real-world compilers.
Agents are being increasingly applied to real-world software engineering tasks, but their performance on complex, real-world codebases remains underexplored. With llvm-autofix, our evaluation of frontier models, including GPT‑5, Gemini 2.5 Pro, DeepSeek V3.2, and Qwen 3 Max, highlights several findings:
- Although these models perform well on general SWE-bench Verified, they struggle on
llvm-bench live: ~60% vs. ~38%. -
llvm-autofix-minioutperformsmini-SWE-agent, achieving ~52%. - As benchmark splits become more challenging (easy
$\to$ medium$\to$ difficult), the performance of frontier models degrades significantly. - After code review by LLVM developers, the true end-to-end bug-fixing capability of frontier models remains below 22%, even when using
llvm-autofix-mini.
The simplest way is using docker after editing environments and fill in the API keys:
docker build -t llvm-autofix-base:latest -f .devcontainer/Dockerfile .
docker build -t llvm-autofix:latest -f Dockerfile --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) .
docker run --rm -it -v $(pwd):/llvm-autofix --cap-add=SYS_PTRACE --security-opt seccomp=unconfined llvm-autofix:latest
# tmux # Optional: spawn a tmux session if you want to see GDB's output.
source ./scripts/upenv.shOr follow BUILD.md to install required dependencies and bring up the environment locally.
Launch our agent on a specific issue with:
python -m autofix.mini --issue <issue_id> --model <model_name>Or launch mini-SWE-agent on a specific issue:
python -m autofix.mswe --issue <issue_id> --model <model_name>Or launch Claude Code—please install it by hand—on a specific issue:
python -m autofix.xcli --xcli claudecode --issue <issue_id> --model <model_name> --stats <stats_path>NOTICE: The above commands work only with issues from our benchmark at present. Running them with other issues might also work, yet one needs to coin a ./bench/full/<issue_id>.json following the format of our collected issues. We are working on a fully automated agent pipeline for such cases. We are also harnessing other coding agents.
Benchmark the agent on our benchmarks with:
./bench/benchmark.sh <agent_name> -B <bench_name> -o <output_dir>Please read guidelines in CONTRIBUTORS.md.
If you found this work helpful, please consider citing our work:
@misc{llvm-autofix,
title={Agentic Harness for Real-World Compilers},
author={Yingwei Zheng and Cong Li and Shaohua Li and Yuqun Zhang and Zhendong Su},
year={2026},
eprint={2603.20075},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2603.20075},
}Artifacts for the arXiv paper are available at the experiment branch.