Skip to content

docs: search agent dev note#350

Merged
dhruvnathawani merged 22 commits into
mainfrom
dhruv/devnotes/search-agent
Mar 12, 2026
Merged

docs: search agent dev note#350
dhruvnathawani merged 22 commits into
mainfrom
dhruv/devnotes/search-agent

Conversation

@dhruvnathawani

@dhruvnathawani dhruvnathawani commented Feb 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Add a dev note documenting the search agent SFT data pipeline used to generate tool-use trajectories for training Nemotron Super's web browsing capabilities.

What's in the post

  1. Motivation: Why training search agents requires trajectory data capturing the full thought-action-observation loop, not just QA pairs. BrowseComp benchmark context.
  2. Pipeline walkthrough: Wikidata KG random walks (50k seeds) -> two-stage riddle generation (draft + BrowseComp-style obfuscation) via Data Designer LLM columns -> search trajectory rollouts with live Tavily web search via MCP -> post-processing to SFT-ready JSONL
  3. ASCII pipeline diagram showing the 4-stage flow (seed data -> riddle generation -> trajectory rollouts -> SFT dataset)
  4. Example Wikidata paths (NVIDIA -> Jensen Huang -> Oregon State -> Benton County -> Thomas Hart Benton) with draft-to-obfuscated question transformation
  5. Seed filtering heuristics: anti-meta filters, hop range constraints (4-8), generic entity removal
  6. Full trajectory example in OpenAI-messages format showing system prompt, tool calls, tool responses, and final answer
  7. Production yield analysis: 50k seeds -> 37k valid seeds (74%) -> 24k valid questions (65%) -> 7k valid trajectories (29%), 14% end-to-end yield
  8. Correctness challenge: multi-answer validity, stale Wikidata ground truth (U.S. Steel/Nippon Steel example), 27.5% raw accuracy before filtering
  9. Data Designer MCP integration walkthrough: LocalStdioMCPProvider, ToolConfig (allowlists, turn budgets, timeouts), tool_alias + with_trace=TraceType.ALL_MESSAGES for full conversation capture
  10. Collapsible full source script with Tavily MCP server, 3-column DAG (draft -> obfuscated -> agent trajectory), and trace capture
  11. Next steps: scale to 25k questions, push difficulty higher, explore fresher knowledge bases, search RL environment

Files changed

  1. docs/devnotes/posts/search-agent.md (updated with code dropdown)
  2. docs/devnotes/.authors.yml (pulled latest from main with dnathawani)
  3. mkdocs.yml (added Search Agent to Dev Notes nav)

@dhruvnathawani dhruvnathawani changed the title search agent dev notes docs: search agent dev note Feb 26, 2026
@dhruvnathawani dhruvnathawani marked this pull request as ready for review February 26, 2026 20:30
@dhruvnathawani dhruvnathawani requested a review from a team as a code owner February 26, 2026 20:30
@greptile-apps

greptile-apps Bot commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new dev note documenting the search agent SFT data pipeline used to generate tool-use trajectories for training Nemotron Super's web browsing capabilities, along with two supporting images and a nav entry in mkdocs.yml.

The post is thorough and well-written, covering motivation, the 4-stage pipeline (Wikidata KG walks → riddle generation → trajectory rollouts → SFT post-processing), yield analysis, correctness challenges, and a Data Designer MCP integration walkthrough. Previously flagged issues from earlier review rounds have been largely resolved — the wikidata-graph-walk.png image and search_agent.py asset file are both now present, the trajectory example acknowledges the simplified OpenAI-spec format explicitly, and the yield numbers are self-consistent.

Key changes:

  • docs/devnotes/posts/search-agent.md — new 587-line dev note with code snippets, pipeline diagrams, benchmark results, and a collapsible full-source recipe
  • docs/devnotes/posts/images/browsecomp-benchmark-results.jpg and wikidata-graph-walk.png — supporting images referenced from the post
  • mkdocs.yml — "Search Agent" added to the Dev Notes nav with correct most-recent-first ordering

Confidence Score: 5/5

  • This is a documentation-only PR with no runtime logic changes; it is safe to merge.
  • All changes are documentation, images, and a nav entry. Previously flagged blockers (missing image, missing script file) have been resolved. The only remaining issue is a cosmetic double-slash in one URL, which does not prevent the build from succeeding.
  • No files require special attention beyond the minor URL typo in docs/devnotes/posts/search-agent.md line 585.

Important Files Changed

Filename Overview
docs/devnotes/posts/search-agent.md New dev note documenting the search agent SFT pipeline; comprehensive and well-structured. Previously flagged issues (missing image, missing script file, OpenAI-spec clarification, yield numbers) have been resolved. One minor typo remains: a double slash in the GTC 2026 URL.
mkdocs.yml Adds "Search Agent" to the Dev Notes nav and reorders RQA Dataset below Design Principles, with an explanatory comment about most-recent-first ordering. Change is minimal and correct.
docs/devnotes/posts/images/browsecomp-benchmark-results.jpg New benchmark results image referenced from search-agent.md; no issues.
docs/devnotes/posts/images/wikidata-graph-walk.png New Wikidata graph walk illustration referenced from search-agent.md; previously flagged as missing, now included. No issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["🌐 Wikidata Knowledge Graph\n(100M+ entities)"] -->|SPARQL random walks\n4–8 hops, anti-meta filters| B["Stage 1: Seed Data\n50,000 JSONL records\nseed_entity → final_answer_entity"]

    B -->|74% pass filter| C["Stage 2a: Draft Riddle\nLLM — chain clues,\nhide intermediate nodes"]
    C -->|65% valid| D["Stage 2b: BrowseComp Obfuscation\nLLM — concise, no breadcrumbs,\n1–2 sentences"]

    D -->|24,000 valid questions| E["Stage 3: Trajectory Rollouts\nMiniMax-M2 + Tavily MCP\nThought → Tool Call → Observation loop\n~12 tool calls/sample"]

    E -->|29% correct & complete| F["Stage 4: Post-Processing\nNormalize tool outputs\nDrop truncated\nSelect best rollout\nWrite OpenAI-messages JSONL"]

    F --> G["~7,000 SFT Records\n14% end-to-end yield\nNemotron Super 0% → 31.28% BrowseComp"]

    style A fill:#1a1a2e,color:#eee
    style B fill:#16213e,color:#eee
    style C fill:#16213e,color:#eee
    style D fill:#16213e,color:#eee
    style E fill:#0f3460,color:#eee
    style F fill:#0f3460,color:#eee
    style G fill:#533483,color:#eee
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/search-agent.md
Line: 585

Comment:
**Double slash in GTC 2026 URL**

The URL contains a double slash after the domain (`nvidia.com//gtc/`), which is a typo. Most servers will redirect it, but it looks unprofessional in published documentation and may break for some link validators or users on strict proxies.

```suggestion
6. [GTC 2026 Workshop: Building Search Agents with NeMo Data Designer](https://www.nvidia.com/gtc/session-catalog/sessions/gtc26-dlit81572/)
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: ecec9a7

Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md
Comment thread docs/devnotes/posts/search-agent.md
Comment thread docs/devnotes/posts/search-agent.md
- Update date to 2026-03-11 (mvansegbroeck)
- Rewrite intro: keep agentic shift opener, replace middle paragraphs
  with BrowseComp-first framing per mvansegbroeck suggestion
- Fix trajectory example: "hour-angle coordinate system" is incorrect,
  changed to "first equatorial coordinate system" (mvansegbroeck)
- Move stale Wikidata ground truth discussion earlier into Step 1 as
  motivation, with forward link to Correctness Challenge (mvansegbroeck)
- Move safety controls from Key Takeaway #7 into MCP Integration
  section where it fits better (mvansegbroeck)
- Add GTC 2026 workshop link to Key Resources (mvansegbroeck)
- Replace outdated Try For Yourself code (LocalStdioMCPProvider, Qwen
  model, no Stage 4) with minimal inline snippet + full recipe include
  using hosted Tavily MCP endpoint (same pattern as text-to-sql)
Comment thread docs/devnotes/posts/search-agent.md
Comment thread docs/devnotes/posts/search-agent.md
PR feedback:
- Update date to 2026-03-12
- Rewrite intro: keep agentic shift opener, replace middle paragraphs
  with BrowseComp-first framing
- Fix trajectory example: "hour-angle coordinate system" ->
  "first equatorial coordinate system"
- Move stale Wikidata ground truth into Step 1 as early motivation
- Move safety controls from Key Takeaway #7 into MCP Integration section
- Add GTC 2026 workshop link to Key Resources
New content:
- Add Wikidata graph walk diagram (wikidata-graph-walk.png) to Step 1
- Add BrowseComp Benchmark Results section with bar chart (JPG):
  Nemotron Super 0% -> 31.28% (SFT + RL), vs GPT-OSS-120B 33.89%
- Replace outdated Try For Yourself code with minimal inline snippet +
  full recipe include via pymdownx.snippets
Comment thread docs/devnotes/posts/search-agent.md
Comment thread docs/devnotes/posts/search-agent.md
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md
@dhruvnathawani

Copy link
Copy Markdown
Contributor Author

greptile is just wrong in this case with it's suggestions

mvansegbroeck
mvansegbroeck previously approved these changes Mar 12, 2026
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md Outdated
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
Comment thread mkdocs.yml Outdated
johnnygreco
johnnygreco previously approved these changes Mar 12, 2026

@johnnygreco johnnygreco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, @dhruvnathawani !

Comment thread docs/devnotes/posts/search-agent.md Outdated
Comment thread docs/devnotes/posts/search-agent.md
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
johnnygreco
johnnygreco previously approved these changes Mar 12, 2026

@johnnygreco johnnygreco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@dhruvnathawani dhruvnathawani merged commit eac63a1 into main Mar 12, 2026
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants