SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

⚡ Updates

22/07/2025: 🏆 Our paper receives Best Paper Honorable Mention at ICML 2025 DataWorld Workshop!
01/06/2025: 🎉 We release our paper, models and codebase.

🚀 TL;DR

SynthRL is a scalable and guaranteed pipeline for automatic data scaling in reasoning-oriented RL training. It addresses a critical challenge in RLVR (Reinforcement Learning with Verifiable Reward): how to scale the training data with correctness and distribution guarantees to achieve better performance. SynthRL achieves this through a three-stage process:

Seed Data Selection: Identifying appropriate seed questions based on Monte Carlo rollout pass rates
Targeted Synthesis: Generating more challenging variants while preserving original answers
Guaranteed Verification: Ensuring near-perfect correctness and difficulty enhancement

🎯 Key Benefits:

Scalable data generation -- automatically synthesizes more challenging, verified questions
Strong performance -- consistent gains across five out-of-domain visual math reasoning benchmarks
Enhanced reasoning depth -- greatest improvements on the most challenging evaluation samples

🔧 Scale your RLVR training data with guaranteed quality and enhanced difficulty!

🛠️ Usage

(Step 1) Install

conda create -n synthrl python=3.10 -y && conda activate synthrl
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 transformers==4.49.0 numpy==1.26.4
pip install google-generativeai

pip install -e .

(Step 2) Data Synthesis

First, please download the processed K12-Freeform-8K dataset from here and put it under ./sampled_data.

bash ./scripts/run_evolve_verifiable.sh

Please export your Google API key for the synthesizer model. For Qwen model evaluation, please use vLLM or any OpenAI-compatible API that supports Qwen models.

After synthesis is complete, convert the format for RL training:

python convert_format_for_r1.py --input_path [YOUR_SYNTHESIZED_DATA_NAME]

(Step 3) Training

bash scripts/run_qwen2_5_vl_7b_synthrl.sh [INFO] [TAG] [SAVE_FREQ] [TOTAL_EPISODES]

For example:

bash scripts/run_qwen2_5_vl_7b_synthrl.sh QWEN2.5-SynthRL A-MMK12-8K 16 8

(Step 4) Evaluation

bash ./scripts/run_eval_vlm_all.sh [INFO] [TAG]

For example:

bash ./scripts/run_eval_vlm_all.sh QWEN2.5-SynthRL/qwen2_5_vl_7b_A-MMK12-8K

This command will run all available checkpoints under this directory using tensor parallelism = 2 and data parallelism = 4.

For evaluation data, please download from here, unzip data.zip and place it under the ./evaluation directory. We provide all the raw difficulty battle records and difficulty Elo ratings for the community for further research.

Citation

If you find our work useful for your research, please consider citing:

@article{wu2025synthrl,
  title={SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis},
  author={Wu, Zijian and Ni, Jinjie and Liu, Xiangyan and Liu, Zichen and Yan, Hang and Shieh, Michael Qizhe},
  journal={arXiv preprint arXiv:2506.02096},
  year={2025}
}

Acknowledgement

The training codes are built on EasyR1, and the evaluation suite employs vLLM for acceleration.
The base models are from Qwen2.5-VL-7B-Instruct.
The original training dataset is from MMK12.
The evaluation datasets are from MathVerse, MathVision, MathVista, WeMath, and DynaMath.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
evaluation		evaluation
scripts		scripts
synthrl_framework		synthrl_framework
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
convert_ckpt.sh		convert_ckpt.sh
convert_format_for_r1.py		convert_format_for_r1.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

⚡ Updates

🚀 TL;DR

🛠️ Usage

(Step 1) Install

(Step 2) Data Synthesis

(Step 3) Training

(Step 4) Evaluation

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

NUS-TRAIL/SynthRL

Folders and files

Latest commit

History

Repository files navigation

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

⚡ Updates

🚀 TL;DR

🛠️ Usage

(Step 1) Install

(Step 2) Data Synthesis

(Step 3) Training

(Step 4) Evaluation

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages