Deep Researcher is an advanced autonomous agent architecture designed to generate detailed, PhD-level research reports on complex topics. Moving beyond the "parallel scaling" approach of previous agents, this system utilizes Sequential Research Plan Refinement and Candidates Crossover to achieve superior depth and coherence.
Powered by gemini-2.5-pro, Deep Researcher achieves an overall score of 46.21 on the DeepResearch Bench, outperforming existing solutions like Claude Research, Perplexity Research, and Grok Deeper Search.
DeepResearch Bench leaderboard: View
You can execute the Deep Researcher with your research task this way.
cd src
python main.py "What are Deep Research Agents"
You can also execute the Deep Researcher with the Candidate Crossover algorithm enabled. By default it is disabled.
cd src
python main.py "What are Deep Research Agents" -ecc
-
Sequential Plan Refinement: Unlike agents that execute fixed sub-tasks in parallel, this architecture maintains a "Global Research Context" and uses reflection to dynamically update its research plan at runtime based on progress.
-
Candidates Crossover Algorithm: A search optimization technique where multiple model candidates (with varied
temperatureandtop_k) investigate queries in parallel. Their findings are synthesized to reduce hallucinations and improve fact density. -
Global Research Context: A centralized memory system that stores all search trajectories and artifacts, allowing the agent to reason across the entire research history and avoid "siloed knowledge" problems.
-
One-Shot Report Generation: Generates the final report in a single pass using the fully matured Global Context, ensuring narrative consistency and high factual density.
The Deep Researcher operates through a cyclic, sequential process that allows for continuous reasoning and course correction.
The 7-Step Workflow
-
Research Plan Curation: The Planning Agent creates an initial step-by-step plan for the topic.
-
Generate Search Query: The Search Agent reads the current plan and global context to formulate the next query.
-
Answer Search Query (Candidates Crossover): The agent uses the Web Search tool (Tavily) and the Crossover algorithm to gather high-quality information.
-
Research Plan Reflection: The Planning Agent reviews the Global Context to decide if the plan needs modification.
-
Research Plan Update: If gaps are found during reflection, the plan is updated dynamically (e.g., adding new sub-topics).
-
Analyze Research Progress: An LLM-as-a-judge evaluates if the research meets the sufficiency threshold (>90%).
-
One-Shot Report Generation: Once the loop terminates, the Report Writer synthesizes the final document.
To ensure high-quality data retrieval during Step 3, we implement the Candidates Crossover algorithm, inspired by Google's TTD-DR Self-Evolution View Paper.
-
Initialization: We initialize
$n=3$ candidate agents. -
Diversity: Each candidate operates with different temperature and top_k configurations to explore varied search spaces.
-
Execution: Candidates query the web (using Tavily, filtering results with <30% relevance) and generate independent answers.
-
Synthesis: A crossover mechanism merges these answers into a single, comprehensive response, which is then committed to the Global Context.
The model was evaluated using the DeepResearch Bench, comprising 100 PhD-level tasks across 22 fields, using the RACE (Reference-based Adaptive Criteria-driven Evaluation) framework.
Our Deep Researcher significantly outperforms parallel architectures (like Static-DRA) and major competitors. DeepResearch Bench leaderboard: View
| Agent | Overall Score | Comprehensiveness | Insight | Instruction Following | Readability |
|---|---|---|---|---|---|
| Tavily Research | 52.44 | 52.84 | 53.59 | 51.92 | 49.21 |
| Gemini 2.5 Pro DeepResearch | 49.71 | 49.51 | 49.45 | 50.12 | 50.00 |
| Deep Researcher (Ours) | 46.21 | 43.44 | 45.48 | 48.99 | 48.21 |
| Claude Research | 45.00 | 45.34 | 42.79 | 47.58 | 44.66 |
| Perplexity Research | 40.46 | 39.10 | 35.65 | 46.11 | 43.08 |
| Grok Deeper Search | 38.22 | 36.08 | 30.89 | 46.59 | 42.17 |
We use DeepResearch Bench's RACE (Reference-based Adaptive Criteria-driven Evaluation) framework to evaluate our Deep Researcher.
We perform a complete evaluation of our Deep Researcher on the entire 100 PhD-level Research Tasks from 22 distinct fields in 2 languages (English and Chinese).
-
[17 Jan 2026] Our Deep Researcher achieved an overall score of
46.21on the Deep ResearchBench making it in the list of top-10 Deep Researcher worldwide See Leaderboard beatingClaude Researcherand currently0.24points belowOpenAI Deep Research. -
Full Evaluation scores are available here: View
Overall Evaluation Cost: USD 401.25 (INR 36,222.78)
- Dec 2025:
USD 157.60(INR 14,186.28) - Jan 2026:
USD 243.65(INR 21,994.78)
Note: USD to INR exchange rates as of period around Dec & Jan 2026
Prefer using virtual environment for managing external library dependencies.
- You can setup your virtual environment with the following command
pip install virtualenv
virtualenv <your_environment_name>
- Activate your virtual environment:
source <your_environment_name>/bin/activate
- Deactivate your virtual environment:
deactivate
pip install virtualenv
pip install dotenv
pip install langchain-google-genai
pip install langchain-tavily
@misc{prateek2026deepresearchersequentialplan,
title={Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)},
author={Saurav Prateek},
year={2026},
eprint={2601.20843},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.20843},
}
