SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

🤗 HF Dataset • 🤗 SWE-Lego-Qwen3-8B/32B • 🧑‍💻 Code • 📖 Paper

We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-of-the-art performance in software engineering (SWE) issue resolving. SWE-Lego comprises three core building blocks:

the SWE-Lego dataset, a collection of 32k highquality task instances and 18k validated trajectories, combining real and synthetic data to complement each other in both quality and quantity;
a refined SFT procedure with error masking and a difficulty-based curriculum, which demonstrably improves action quality and overall performance;
a well-trained verifier for improving test-time scaling (TTS).

Our fine-tuned policy models are trained exclusively with SFT from Qwen3-8B and Qwen3-32B. Their effectiveness is demonstrated on SWE-Bench-Verified:

SWE-Lego-Qwen3-8B: 42.2% Pass@1, 49.6% TTS@16
SWE-Lego-Qwen3-32B: 52.6% Pass@1, 58.8% TTS@16

For verifier-based TTS, we also release SWE-Lego-Verifier-8B and SWE-Lego-Verifier-30B-A3B, together with the verifier dataset SWE-Lego/SWE_Lego_real_data_Verifier.

We’ve open-sourced everything—our datasets, models, code, and training scripts—for everyone to progress on scaling and improving software engineering agents.

Reproduction Guide 🎯

1. 📦 Installation

git clone https://github.com/SWE-Lego/SWE-Lego.git

1.1 Installing vllm environment

conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm

1.2 Installing openhands environment

You can refer to the Development Guide from Openhands.

cd SWE-Lego/OpenHands-0.53.0
conda create -n openhands python=3.12 -y
conda activate openhands
conda install -c conda-forge nodejs=24.4.1 
conda install -c conda-forge poetry=2.1.4
pip install python-dateutil==2.9.0.post0
poetry run pip install datasets
make build

1.3 Installing swebench environment

cd SWE-Lego/SWE-bench-4.0.4
conda create -n swebench python=3.12 -y
conda activate swebench
pip install -e .

1.4 Installing llamafactory environment

cd SWE-Lego/LLaMA-Factory-0.9.4.dev0
conda create -n lf python=3.12 -y
conda activate lf

pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics,deepspeed,liger-kernel]" --no-build-isolation

# install flash-attn
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
pip install flash_attn-2.8.3+cu12torch2.8cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

pip install wandb

🤖 2. Inference and Evaluation of SWE-Lego-Qwen3-8B/32B

We take the SWE-Lego-Qwen3-32B for an example.

2.1 Serving the model via vllm

bash scripts/swe_lego_qwen3_32b/serve_vllm.sh

2.2 Running inference via openhands

bash scripts/swe_lego_qwen3_32b/infer.sh

2.3 Running evaluation via swebench

bash scripts/swe_lego_qwen3_32b/eval.sh

🔥 3. Training of SWE-Lego-Qwen3-8B/32B

3.1 Downloading trajectories for SFT from Hugging Face

Save the downloaded trajectories to LLaMA-Factory-0.9.4.dev0/data

import json
from datasets import load_dataset

datasets = [
    {
        "name": "SWE-Lego/SWE-Lego-Real-Data",
        "filename": "swe_lego_real_data_resolved_trajectories.json"
    },
    {
        "name": "SWE-Lego/SWE-Lego-Synthetic-Data", 
        "filename": "swe_lego_synthetic_data_resolved_trajectories.json"
    }
]

for config in datasets:
    ds = load_dataset(config["name"], split="resolved")
    processed_ds = ds.select_columns(["instance_id", "messages"])
    data_list = processed_ds.to_list()
    
    with open(config["filename"], "w", encoding="utf-8") as f:
        json.dump(data_list, f, ensure_ascii=False, indent=4)
    print(f"Saved {len(data_list)} records to {config['filename']}")

3.2 Running SFT via llamafactory

bash scripts/swe_lego_qwen3_8b/sft.sh
bash scripts/swe_lego_qwen3_32b/sft.sh

🧪 4. Verifier Training, Inference

In our TTS setting, we find generative verifier selection works better than regressive alternatives, and our verifier-enabled setup reaches 49.6% TTS@16 (SWE-Lego-Qwen3-8B) and 58.8% TTS@16 (SWE-Lego-Qwen3-32B).

4.1 Verifier data and conversion

Verifier training data is released at SWE-Lego/SWE_Lego_real_data_Verifier. Each verifier sample is built from three parts: trajectory + patch + judgement. At inference time, only trajectory + patch are used as input; judgement is predicted by the verifier.

Use this script to convert raw rollout trajectories into verifier inference format:

python LLaMA-Factory-0.9.4.dev0/tts/convert_trajectories_to_verifier.py \
  --input /path/to/raw_trajectories.jsonl \
  --output /path/to/verifier_input.jsonl

The converter reads instance-level trajectories (e.g., run_1, run_2, ... with funccalloff_messages, patch, and resolved) and writes verifier-ready JSONL with system + user messages.

4.2 Train verifier models

Verifier SFT configs:

LLaMA-Factory-0.9.4.dev0/examples/train_full/swe_lego_verifier_qwen3_8b.yaml
LLaMA-Factory-0.9.4.dev0/examples/train_full/swe_lego_verifier_qwen3_30b_a3b.yaml

Training launch scripts:

scripts/swe_lego_verifier_qwen3_8b/sft.sh
scripts/swe_lego_verifier_qwen3_30b_a3b/sft.sh

Run from repo root:

bash scripts/swe_lego_verifier_qwen3_8b/sft.sh
bash scripts/swe_lego_verifier_qwen3_30b_a3b/sft.sh

4.3 Run verifier inference

Inference launch scripts:

scripts/swe_lego_verifier_qwen3_8b/infer.sh
scripts/swe_lego_verifier_qwen3_30b_a3b/infer.sh

Example:

bash scripts/swe_lego_verifier_qwen3_8b/infer.sh /path/to/verifier_input.jsonl
bash scripts/swe_lego_verifier_qwen3_30b_a3b/infer.sh /path/to/verifier_input.jsonl

The inference output contains per-instance run scores (including predicted_score), which can be used for ranking rollouts in parallel TTS.

Acknowledgements

This project acknowledges the valuable contributions of the following open-source repositories:

SWE-bench: https://github.com/princeton-nlp/SWE-bench - A benchmark for evaluating AI agents' software engineering capabilities, which informed the evaluation design of this project.
OpenHands: https://github.com/All-Hands-AI/OpenHands - Provides core logic for AI agent interaction and runtime management, referenced in this project's execution modules.
LLaMA Factory: https://github.com/hiyouga/LLaMA-Factory - Offers LLM fine-tuning and prompt processing utilities, used to enhance this project's language model interaction capabilities.

Citation 📝

Please cite our paper if you find the repo helpful in your work:

@misc{swelego,
      title={SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving}, 
      author={Chaofan Tao and Jierun Chen and Yuxin Jiang and Kaiqi Kou and Shaowei Wang and Ruoyu Wang and Xiaohui Li and Sidi Yang and Yiming Du and Jianbo Dai and Zhiming Mao and Xinyu Wang and Lifeng Shang and Haoli Bai},
      year={2026},
      eprint={2601.01426},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2601.01426}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
LLaMA-Factory-0.9.4.dev0		LLaMA-Factory-0.9.4.dev0
OpenHands-0.53.0		OpenHands-0.53.0
SWE-bench-4.0.4		SWE-bench-4.0.4
assets		assets
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Reproduction Guide 🎯

1. 📦 Installation

1.1 Installing vllm environment

1.2 Installing openhands environment

1.3 Installing swebench environment

1.4 Installing llamafactory environment

🤖 2. Inference and Evaluation of SWE-Lego-Qwen3-8B/32B

2.1 Serving the model via vllm

2.2 Running inference via openhands

2.3 Running evaluation via swebench

🔥 3. Training of SWE-Lego-Qwen3-8B/32B

3.1 Downloading trajectories for SFT from Hugging Face

3.2 Running SFT via llamafactory

🧪 4. Verifier Training, Inference

4.1 Verifier data and conversion

4.2 Train verifier models

4.3 Run verifier inference

Acknowledgements

Citation 📝

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Reproduction Guide 🎯

1. 📦 Installation

1.1 Installing vllm environment

1.2 Installing openhands environment

1.3 Installing swebench environment

1.4 Installing llamafactory environment

🤖 2. Inference and Evaluation of SWE-Lego-Qwen3-8B/32B

2.1 Serving the model via vllm

2.2 Running inference via openhands

2.3 Running evaluation via swebench

🔥 3. Training of SWE-Lego-Qwen3-8B/32B

3.1 Downloading trajectories for SFT from Hugging Face

3.2 Running SFT via llamafactory

🧪 4. Verifier Training, Inference

4.1 Verifier data and conversion

4.2 Train verifier models

4.3 Run verifier inference

Acknowledgements

Citation 📝

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages