π€ HF Dataset β’ π€ SWE-Lego-Qwen3-8B/32B β’ π§βπ» Code β’ π Paper
We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-of-the-art performance in software engineering (SWE) issue resolving. SWE-Lego comprises three core building blocks:
- the SWE-Lego dataset, a collection of 32k highquality task instances and 18k validated trajectories, combining real and synthetic data to complement each other in both quality and quantity;
- a refined SFT procedure with error masking and a difficulty-based curriculum, which demonstrably improves action quality and overall performance;
- a well-trained verifier for improving test-time scaling (TTS).
Our fine-tuned policy models are trained exclusively with SFT from Qwen3-8B and Qwen3-32B. Their effectiveness is demonstrated on SWE-Bench-Verified:
- SWE-Lego-Qwen3-8B: 42.2% Pass@1, 49.6% TTS@16
- SWE-Lego-Qwen3-32B: 52.6% Pass@1, 58.8% TTS@16
For verifier-based TTS, we also release SWE-Lego-Verifier-8B and SWE-Lego-Verifier-30B-A3B, together with the verifier dataset SWE-Lego/SWE_Lego_real_data_Verifier.
Weβve open-sourced everythingβour datasets, models, code, and training scriptsβfor everyone to progress on scaling and improving software engineering agents.
git clone https://github.com/SWE-Lego/SWE-Lego.gitconda create -n vllm python=3.12 -y
conda activate vllm
pip install vllmYou can refer to the Development Guide from Openhands.
cd SWE-Lego/OpenHands-0.53.0
conda create -n openhands python=3.12 -y
conda activate openhands
conda install -c conda-forge nodejs=24.4.1
conda install -c conda-forge poetry=2.1.4
pip install python-dateutil==2.9.0.post0
poetry run pip install datasets
make buildcd SWE-Lego/SWE-bench-4.0.4
conda create -n swebench python=3.12 -y
conda activate swebench
pip install -e .cd SWE-Lego/LLaMA-Factory-0.9.4.dev0
conda create -n lf python=3.12 -y
conda activate lf
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics,deepspeed,liger-kernel]" --no-build-isolation
# install flash-attn
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
pip install flash_attn-2.8.3+cu12torch2.8cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
pip install wandbWe take the SWE-Lego-Qwen3-32B for an example.
bash scripts/swe_lego_qwen3_32b/serve_vllm.shbash scripts/swe_lego_qwen3_32b/infer.shbash scripts/swe_lego_qwen3_32b/eval.shSave the downloaded trajectories to LLaMA-Factory-0.9.4.dev0/data
import json
from datasets import load_dataset
datasets = [
{
"name": "SWE-Lego/SWE-Lego-Real-Data",
"filename": "swe_lego_real_data_resolved_trajectories.json"
},
{
"name": "SWE-Lego/SWE-Lego-Synthetic-Data",
"filename": "swe_lego_synthetic_data_resolved_trajectories.json"
}
]
for config in datasets:
ds = load_dataset(config["name"], split="resolved")
processed_ds = ds.select_columns(["instance_id", "messages"])
data_list = processed_ds.to_list()
with open(config["filename"], "w", encoding="utf-8") as f:
json.dump(data_list, f, ensure_ascii=False, indent=4)
print(f"Saved {len(data_list)} records to {config['filename']}")bash scripts/swe_lego_qwen3_8b/sft.sh
bash scripts/swe_lego_qwen3_32b/sft.shIn our TTS setting, we find generative verifier selection works better than regressive alternatives, and our verifier-enabled setup reaches 49.6% TTS@16 (SWE-Lego-Qwen3-8B) and 58.8% TTS@16 (SWE-Lego-Qwen3-32B).
Verifier training data is released at SWE-Lego/SWE_Lego_real_data_Verifier.
Each verifier sample is built from three parts: trajectory + patch + judgement.
At inference time, only trajectory + patch are used as input; judgement is predicted by the verifier.
Use this script to convert raw rollout trajectories into verifier inference format:
python LLaMA-Factory-0.9.4.dev0/tts/convert_trajectories_to_verifier.py \
--input /path/to/raw_trajectories.jsonl \
--output /path/to/verifier_input.jsonlThe converter reads instance-level trajectories (e.g., run_1, run_2, ... with funccalloff_messages, patch, and resolved) and writes verifier-ready JSONL with system + user messages.
Verifier SFT configs:
LLaMA-Factory-0.9.4.dev0/examples/train_full/swe_lego_verifier_qwen3_8b.yamlLLaMA-Factory-0.9.4.dev0/examples/train_full/swe_lego_verifier_qwen3_30b_a3b.yaml
Training launch scripts:
scripts/swe_lego_verifier_qwen3_8b/sft.shscripts/swe_lego_verifier_qwen3_30b_a3b/sft.sh
Run from repo root:
bash scripts/swe_lego_verifier_qwen3_8b/sft.sh
bash scripts/swe_lego_verifier_qwen3_30b_a3b/sft.shInference launch scripts:
scripts/swe_lego_verifier_qwen3_8b/infer.shscripts/swe_lego_verifier_qwen3_30b_a3b/infer.sh
Example:
bash scripts/swe_lego_verifier_qwen3_8b/infer.sh /path/to/verifier_input.jsonl
bash scripts/swe_lego_verifier_qwen3_30b_a3b/infer.sh /path/to/verifier_input.jsonlThe inference output contains per-instance run scores (including predicted_score), which can be used for ranking rollouts in parallel TTS.
This project acknowledges the valuable contributions of the following open-source repositories:
- SWE-bench: https://github.com/princeton-nlp/SWE-bench - A benchmark for evaluating AI agents' software engineering capabilities, which informed the evaluation design of this project.
- OpenHands: https://github.com/All-Hands-AI/OpenHands - Provides core logic for AI agent interaction and runtime management, referenced in this project's execution modules.
- LLaMA Factory: https://github.com/hiyouga/LLaMA-Factory - Offers LLM fine-tuning and prompt processing utilities, used to enhance this project's language model interaction capabilities.
Please cite our paper if you find the repo helpful in your work:
@misc{swelego,
title={SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving},
author={Chaofan Tao and Jierun Chen and Yuxin Jiang and Kaiqi Kou and Shaowei Wang and Ruoyu Wang and Xiaohui Li and Sidi Yang and Yiming Du and Jianbo Dai and Zhiming Mao and Xinyu Wang and Lifeng Shang and Haoli Bai},
year={2026},
eprint={2601.01426},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2601.01426},
}