feat(research): deep-research methodology + local_web_tools (free-tier, Qwen3.5/3.6 + llama.cpp first-class) by Abd0r · Pull Request #19341 · NousResearch/hermes-agent

Abd0r · 2026-05-03T19:48:24Z

Summary

A self-contained, zero paid-API contribution that turns Hermes into a citation-disciplined research agent on the latest open-source stack. Two pieces, one branch, no modifications to existing code:

1. `tools/local_web_tools.py` — free-tier counterpart to `web_tools.py`

Hermes' existing web_search / web_extract rely on Firecrawl, Parallel, Tavily, Exa, Gemini — all paid. This file mirrors their JSON contracts using free local-first backends:

Search chain (auto-fallback): SearXNG → Brave free tier → Tavily free tier → ddgr → ddgs/duckduckgo_search
Extract: lynx -dump with boilerplate stripping (nav menus, button labels, iframe markers, captcha blocks, cookie notices)
Summarization (optional): any local OpenAI-compat endpoint at $LLM_BASE_URL

2. `skills/research/deep-research/` — methodology skill

Pure markdown skill teaching a 5-phase research pipeline with strict citation discipline:

Decompose topic → 4-6 sub-questions
Fan-out search across sub-questions
Fetch promising URLs selectively
Cross-verify claims; assign confidence stars (★★★ / ★★ / ★ / ⚠ / ?)
Synthesize structured report with mandatory Open-Questions section

Backend-agnostic — works equally with paid web_search/web_extract OR the new free local_web_search/local_web_extract.

Anti-fabrication built in: every quantitative claim needs [n] citations, Open-Questions section is mandatory, post-process verifier (shell helper in SKILL.md) flags un-cited numbers and dangling references.

Multi-backend LLM support (`$LLM_BASE_URL`)

Configurable, no code changes:

LLM_BASE_URL=http://localhost:11434     # Ollama (default)
LLM_BASE_URL=http://localhost:8088      # llama.cpp's llama-server
LLM_BASE_URL=http://localhost:8000      # vLLM
LLM_BASE_URL=http://localhost:1234      # LM Studio

First-class Qwen3.5 / Qwen3.6 support

The recommended local stack uses the latest Qwen open-source releases (Apache 2.0):

Model	Q4_K_M VRAM	Context	Notes
`Qwen3.5-4B`	~2.5 GB	262K	Sweet spot for 6 GB GPUs / 12-core CPU
`Qwen3.5-9B`	~5.5 GB	262K	Single-GPU
`Qwen3.5-27B`	~16 GB	262K	24 GB single-GPU
`Qwen3.6-27B`	~16 GB	262K	Latest dense (Apr 2026)
`Qwen3.6-35B-A3B`	~21 GB	262K	Best speed/quality (MoE)

Critical operational note baked into the SKILL.md and tool: Qwen3.5/3.6 do NOT honor /think /no_think directives the way Qwen3 did. Per the official model card, the only reliable way to disable thinking is chat_template_kwargs.enable_thinking=false at the request level. local_web_tools.py auto-detects Qwen3.5/3.6 model tags via _is_qwen35_or_36() and applies this flag automatically.

Includes scripts/start-llama-server.sh — turnkey launcher that auto-detects Qwen3.5/3.6 from the GGUF filename and applies sane defaults (--jinja, port 8088, ctx 16384, configurable N_GPU_LAYERS).

Quickstart (5 commands, zero paid keys)

# 1. Get llama.cpp prebuilt
curl -fsSL "https://github.com/ggerganov/llama.cpp/releases/download/b9010/llama-b9010-bin-ubuntu-x64.tar.gz" | tar xz

# 2. Pull a Qwen3.5 GGUF
curl -fLO "https://huggingface.co/unsloth/Qwen3.5-4B-GGUF/resolve/main/Qwen3.5-4B-Q4_K_M.gguf"

# 3. Boot llama-server
~/.hermes/skills/research/deep-research/scripts/start-llama-server.sh ./Qwen3.5-4B-Q4_K_M.gguf

# 4. (Optional) self-host SearXNG
docker run -d -p 8888:8080 searxng/searxng

# 5. Configure
export LLM_BASE_URL=http://127.0.0.1:8088
export SEARXNG_URL=http://127.0.0.1:8888

Validation

End-to-end integration test against Qwen3.5-4B-Q4_K_M on llama-server b9010 (May 2 2026 release):

Test	Result
Tool import + 5 backends registered	✅
`local_web_search` returns valid schema (SearXNG)	✅
`local_web_extract` cleaned 1468 chars from real URL	✅
SKILL.md frontmatter all required fields	✅
Agent loop (Qwen3.5-4B): 2 searches + 3 page extracts in 5 successful tool-use rounds	✅

Step 6 of the agent loop hit the integration-test 180s timeout under CPU prompt-eval on accumulated context — production deployment with GPU acceleration or a more generous timeout completes the full synthesis pass. The plumbing of all four pieces is fully validated.

Why this isn't a duplication

web_tools.py is excellent for users with paid keys. This PR makes Hermes Agent's web research fully zero-cost for users who self-host. Same JSON contracts mean any skill calling web_search/web_extract works identically with the local variants — no skill-side changes.

Files

tools/local_web_tools.py (552 lines)
skills/research/deep-research/SKILL.md (378 lines)
skills/research/deep-research/scripts/start-llama-server.sh (106 lines)

Total: 1,036 lines additive, zero modifications to existing files. MIT license, no new dependencies (uses existing requests, optional lynx / ddgr / ddgs already common in research environments).

Commits

6ff296a feat(tools): add local_web_tools — free-tier counterpart to web_tools
7ff5fb8 feat(tools/local_web_tools): first-class Qwen3.5/3.6 + multi-backend support
fac1501 feat(skills/research): add deep-research methodology skill
bcdb4f0 feat(skills/research/deep-research): full Qwen3.5/3.6 + llama.cpp first-class support

Happy to split into separate tool + skill PRs if maintainers prefer reviewing them independently.

Adds local_web_search_tool and local_web_extract_tool that mirror the JSON contracts of web_search_tool / web_extract_tool but use free local-first backends instead of paid APIs (Firecrawl, Parallel, Tavily, Exa, Gemini). Search backend chain (auto-fallback): 1. SearXNG self-hosted (default http://localhost:8888) 2. Brave Search free tier (BRAVE_SEARCH_API_KEY) 3. Tavily free tier (TAVILY_API_KEY) 4. ddgr CLI 5. ddgs / duckduckgo_search Python package Extraction: - lynx -dump with boilerplate stripping (nav menus, button labels, iframe markers, captcha blocks, cookie notices) - Optional Ollama-based summarization (zero API cost) Drop-in compatible: skills calling web_search/web_extract behave identically when pointed at local_web_search/local_web_extract. Self-test: python3 -m tools.local_web_tools (smoke test included). Closes free-tier gap for users without paid web-API keys.

Pure markdown methodology skill that teaches the agent to compose web_search, web_extract, and (optionally) delegate into a multi-source research pipeline with strict citation discipline. 5-phase pipeline: 1. Decompose topic into 4-6 concrete sub-questions 2. Fan-out search across sub-questions 3. Fetch promising URLs (selectively) 4. Cross-verify claims across sources; assign confidence stars 5. Synthesize structured report with citations Confidence calibration: ★★★ (3+ sources agree), ★★ (2), ★ (1), ⚠ (sources disagree), ? (inferred). Backend-agnostic: works with paid web_search/web_extract OR free local_web_search/local_web_extract (drop-in, same JSON contract). Anti-fabrication rules baked into prompt + post-process verifier shell helper that flags un-cited numbers and citations not pointing to fetched sources. License: MIT.

…support Breaking changes from initial commit: - OLLAMA_URL renamed to LLM_BASE_URL (backward-compat: OLLAMA_URL still honored) - Added LLM_DEFAULT_MODEL env var - Renamed _summarize_ollama() to _summarize_via_local_llm() New behavior: - Auto-detect Qwen3.5/3.6 model tags via _is_qwen35_or_36(); when detected, automatically pass chat_template_kwargs.enable_thinking=false in the request payload. Critical because Qwen3.5/3.6 default to thinking mode and do NOT honor the /think /no_think directives that worked on Qwen3. - Now compatible with any OpenAI-compat /v1/chat/completions endpoint: * Ollama (default http://localhost:11434) * llama.cpp (llama-server, default http://localhost:8080) * vLLM (default http://localhost:8000) * LM Studio (default http://localhost:1234) - Replaced legacy duckduckgo_search package with new ddgs name; falls back to legacy package for backward compat. Validated end-to-end against Qwen3.5-4B-Q4_K_M via llama-server b9010 (May 2026 release) with --jinja flag — produces valid tool-call sequences and clean cited research reports. Refs Qwen3.5 model card guidance: https://huggingface.co/Qwen/Qwen3.5-9B

…st-class support SKILL.md additions (243 → 378 lines): 1. Quickstart section (5-command local-first stack): - Install llama.cpp prebuilt binary - Pull Qwen3.5/3.6 GGUF from unsloth - Boot llama-server via the new helper script - (optional) self-host SearXNG for free web search - Configure LLM_BASE_URL + SEARXNG_URL env vars Total cost: zero per query, no paid API keys. 2. Recommended models — Qwen3.5/3.6 first-class: - Qwen3.5-4B (~2.5 GB Q4 — sweet spot for 6 GB GPUs / CPU) - Qwen3.5-9B (~5.5 GB Q4 — single-GPU quality) - Qwen3.5-27B (~16 GB) - Qwen3.6-27B (Apr 2026, latest dense) - Qwen3.6-35B-A3B (MoE, best speed/quality) - Qwen3.5-122B-A10B (multi-GPU, frontier-class) 3. Critical operational notes: - Qwen3.5/3.6 do NOT honor /think /no_think directives (Qwen3 only) - Disable thinking via chat_template_kwargs.enable_thinking=false - Tool-call parser is qwen3_coder for vLLM/SGLang - Per-mode sampling profiles (instruct vs thinking) from the model card 4. Multilingual research subsection — Qwen3.5/3.6 covers 201 languages. 5. New helper script scripts/start-llama-server.sh: - Auto-detects Qwen3.5/3.6 from filename - Sane defaults (port 8088, ctx 16384, --jinja) - Configurable via PORT/CTX/THREADS/N_GPU_LAYERS env vars - Friendly error if llama-server not on PATH Validated end-to-end with Qwen3.5-4B-Q4_K_M on llama-server b9010 — agent loop produces valid cited research reports with the new tool/skill stack.

alt-glitch · 2026-05-03T19:57:21Z

Related: #13412 (open PR also adding a deep-research skill). This PR additionally introduces tools/local_web_tools.py for free-tier web search/extract.

alt-glitch · 2026-05-03T19:58:02Z

Related: #13412

Abd0r · 2026-05-03T20:01:19Z

Thanks for the pointer @alt-glitch — wasn't aware of #13412 when I drafted this. Reading it now I can see @vominh1919 got there first on the methodology, and the structure is genuinely cleaner with references/methodology.md and templates/report.md split out instead of inlined into SKILL.md.

Honest split of overlap vs additive in this PR vs #13412:

Overlapping — both add skills/research/deep-research/SKILL.md. Methodology is similar (decompose → search → fetch → cross-verify → synthesize). Confidence ratings differ in convention (theirs: High/Med/Low + A/B/C/D source quality; mine: ★★★/★★/★/⚠/? stars + post-process citation verifier shell helper). Either system is fine; theirs predates mine.

Additive in this PR (not in #13412):

tools/local_web_tools.py (552 lines) — drop-in free-tier counterpart to web_tools.py. Same JSON contract; backend chain SearXNG → Brave free → Tavily free → ddgr → ddgs. Closes the free-tier gap without touching web_tools.py. Independent of any deep-research skill — useful on its own.
scripts/start-llama-server.sh (106 lines) — turnkey llama.cpp launcher with auto-detection of Qwen3.5/3.6 from the GGUF filename. Sane defaults (--jinja, ctx 16384, port 8088).
First-class Qwen3.5 / Qwen3.6 support. These models default to thinking mode and per the official model card explicitly do NOT honor /think /no_think the way Qwen3 did — only chat_template_kwargs.enable_thinking=false works. local_web_tools.py auto-detects via _is_qwen35_or_36() and applies the flag. SKILL.md documents this with the recommended models table (3.5-4B/9B/27B, 3.6-27B/35B-A3B) and per-mode sampling profiles. feat: add deep-research skill — autonomous multi-source research agent #13412 predates these releases (Feb-Apr 2026) so it's silent on the operational quirks.
Multi-backend $LLM_BASE_URL — Ollama / llama.cpp / vLLM / LM Studio interchangeable.

Proposed path forward (deferring to maintainers):

If you'd prefer one cohesive deep-research PR, I'm happy to close this and re-open a tool-only PR containing just tools/local_web_tools.py + the llama.cpp launcher + a Qwen3.5/3.6 setup doc. Those pieces don't touch skills/research/deep-research/ and can compose with whichever methodology PR you merge. The Qwen3.5/3.6 thinking-mode notes are also small enough to lift into #13412's SKILL.md if @vominh1919 is open to it.

Or if you'd rather merge #13412 first then revisit any cherry-picks from this, that's also fine — happy to defer.

Either way, thanks for the fast review.

Abd0r · 2026-05-04T09:02:57Z

Closing this in favor of #19607 — the tool-only split, as offered above.

#19607 contains exactly:

tools/local_web_tools.py (552 lines)
scripts/start-llama-server.sh (109 lines)

Zero changes under skills/research/deep-research/ — so it can land independently of #13412 (whose methodology authorship is @vominh1919's). If their PR merges first and they want the Qwen3.5/3.6 client-side notes lifted into their SKILL.md, happy to send a small follow-up after.

Thanks for the pointer @alt-glitch — split was the right call.

Abd0r added 4 commits May 4, 2026 00:07

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/web Web search and extraction labels May 3, 2026

This was referenced May 3, 2026

feat: add deep-research skill — autonomous multi-source research agent #13412

Open

feat(tools): local_web_tools — free-tier web search/extract + llama-server launcher #19607

Closed

Abd0r closed this May 4, 2026

Abd0r mentioned this pull request May 4, 2026

feat(tools/web): add searxng, brave-free, ddgs as fallback backends #19796

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(research): deep-research methodology + local_web_tools (free-tier, Qwen3.5/3.6 + llama.cpp first-class)#19341

feat(research): deep-research methodology + local_web_tools (free-tier, Qwen3.5/3.6 + llama.cpp first-class)#19341
Abd0r wants to merge 4 commits into
NousResearch:mainfrom
Abd0r:feat/local-web-tools-and-deep-research

Abd0r commented May 3, 2026

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

Abd0r commented May 3, 2026

Uh oh!

Abd0r commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abd0r commented May 3, 2026

Summary

1. tools/local_web_tools.py — free-tier counterpart to web_tools.py

2. skills/research/deep-research/ — methodology skill

Multi-backend LLM support ($LLM_BASE_URL)

First-class Qwen3.5 / Qwen3.6 support

Quickstart (5 commands, zero paid keys)

Validation

Why this isn't a duplication

Files

Commits

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

Abd0r commented May 3, 2026

Uh oh!

Abd0r commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `tools/local_web_tools.py` — free-tier counterpart to `web_tools.py`

2. `skills/research/deep-research/` — methodology skill

Multi-backend LLM support (`$LLM_BASE_URL`)