Skip to content

ZJU-REAL/Perceive-to-Reason

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

🔥 Overview

We introduce P2R, a two-stage visual reasoning framework that explicitly decouples perception from reasoning.

We train P2R with PRA-GRPO, a role-aware alternating RL strategy that converts final-answer correctness into stage-specific supervision.

P2R consistently outperforms its VLM baselines on both high-resolution fine-grained benchmarks and general multimodal reasoning tasks.

🎉 News

📖 Usage

Environment Installation

git clone git@github.com:ZJU-REAL/Perceive-to-Reason.git
cd Perceive-to-Reason

conda create -n perceive-to-reason python=3.10 -y
conda activate perceive-to-reason

bash install.sh

Dataset Installation

Training Data

Download the training dataset from P2R-10k and place it under your data directory.

Evaluation Data

Download the following datasets and place them under your data directory.

Training

PRA-GRPO alternates between two stages. Each stage keeps the other role frozen as an inference service, and requires a verifier service for open-ended QA reward.

Note: Before training, configure the service IP addresses and ports in the training scripts (REASONER_HOST, PERCEIVER_HOST, VERIFIER_HOST and their corresponding _PORT variables). Ensure each service uses a different port to avoid conflicts.

Stage 1: Train Perceiver

Start the frozen reasoner and verifier services:

bash scripts/start_reasoner_server.sh
bash scripts/start_verifier_server.sh

Then launch perceiver training:

bash example/qwen3_vl_4b_p2r/run_pra_grpo_perceiver.sh

Stage 2: Train Reasoner

Start the trained perceiver and verifier services:

bash scripts/start_perceiver_server.sh
bash scripts/start_verifier_server.sh

Then launch reasoner training:

bash example/qwen3_vl_4b_p2r/run_pra_grpo_reasoner.sh

Evaluation

Edit evaluation/run_eval_batch.sh to specify your model, mode, and task:

MODEL_NAMES=("your_model")
EVAL_MODE="p2r"          # "default" | "thinking" | "p2r"
TASKS=("V-Star")         # "V-Star" | "HR-Bench" | "MME-RealWorld-lite" | "MME-RealWorld"

Then run:

cd evaluation
bash run_eval_batch.sh

🙏 Acknowledgement

This project builds on veRL. Training data is sourced from DeepEyes, Mini-o3, and Zooming-without-Zooming. We thank the authors of those projects.

⭐️ Citation

If you find Perceive-to-Reason useful, please consider citing our work:

@misc{li2026perceivetoreasondecouplingperceptionreasoning,
      title={Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning}, 
      author={Hongxing Li and Xiufeng Huang and Dingming Li and Wenjing Jiang and Zixuan Wang and Haolei Xu and Hanrong Zhang and Haiwen Hong and Longtao Huang and Hui Xue and Weiming Lu and Jun Xiao and Yueting Zhuang and Yongliang Shen},
      year={2026},
      eprint={2607.01191},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2607.01191}, 
}

About

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors