Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

🔥 Overview

We introduce P2R, a two-stage visual reasoning framework that explicitly decouples perception from reasoning.

We train P2R with PRA-GRPO, a role-aware alternating RL strategy that converts final-answer correctness into stage-specific supervision.

P2R consistently outperforms its VLM baselines on both high-resolution fine-grained benchmarks and general multimodal reasoning tasks.

🎉 News

[2026/07/02] We release our paper.
[2026/07/01] We release our code, models, and dataset.

📖 Usage

Environment Installation

git clone git@github.com:ZJU-REAL/Perceive-to-Reason.git
cd Perceive-to-Reason

conda create -n perceive-to-reason python=3.10 -y
conda activate perceive-to-reason

bash install.sh

Dataset Installation

Training Data

Download the training dataset from P2R-10k and place it under your data directory.

Evaluation Data

Download the following datasets and place them under your data directory.

Training

PRA-GRPO alternates between two stages. Each stage keeps the other role frozen as an inference service, and requires a verifier service for open-ended QA reward.

Note: Before training, configure the service IP addresses and ports in the training scripts (REASONER_HOST, PERCEIVER_HOST, VERIFIER_HOST and their corresponding _PORT variables). Ensure each service uses a different port to avoid conflicts.

Stage 1: Train Perceiver

Start the frozen reasoner and verifier services:

bash scripts/start_reasoner_server.sh
bash scripts/start_verifier_server.sh

Then launch perceiver training:

bash example/qwen3_vl_4b_p2r/run_pra_grpo_perceiver.sh

Stage 2: Train Reasoner

Start the trained perceiver and verifier services:

bash scripts/start_perceiver_server.sh
bash scripts/start_verifier_server.sh

Then launch reasoner training:

bash example/qwen3_vl_4b_p2r/run_pra_grpo_reasoner.sh

Evaluation

Edit evaluation/run_eval_batch.sh to specify your model, mode, and task:

MODEL_NAMES=("your_model")
EVAL_MODE="p2r"          # "default" | "thinking" | "p2r"
TASKS=("V-Star")         # "V-Star" | "HR-Bench" | "MME-RealWorld-lite" | "MME-RealWorld"

Then run:

cd evaluation
bash run_eval_batch.sh

🙏 Acknowledgement

This project builds on veRL. Training data is sourced from DeepEyes, Mini-o3, and Zooming-without-Zooming. We thank the authors of those projects.

⭐️ Citation

If you find Perceive-to-Reason useful, please consider citing our work:

@misc{li2026perceivetoreasondecouplingperceptionreasoning,
      title={Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning}, 
      author={Hongxing Li and Xiufeng Huang and Dingming Li and Wenjing Jiang and Zixuan Wang and Haolei Xu and Hanrong Zhang and Haiwen Hong and Longtao Huang and Hui Xue and Weiming Lu and Jun Xiao and Yueting Zhuang and Yongliang Shen},
      year={2026},
      eprint={2607.01191},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2607.01191}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
evaluation		evaluation
example/qwen3_vl_4b_p2r		example/qwen3_vl_4b_p2r
figures		figures
scripts		scripts
verl		verl
.gitignore		.gitignore
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

🔥 Overview

🎉 News

📖 Usage

Environment Installation

Dataset Installation

Training

Evaluation

🙏 Acknowledgement

⭐️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

🔥 Overview

🎉 News

📖 Usage

Environment Installation

Dataset Installation

Training

Evaluation

🙏 Acknowledgement

⭐️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages