feat(research): deep-research methodology + local_web_tools (free-tier, Qwen3.5/3.6 + llama.cpp first-class)#19341
Conversation
Adds local_web_search_tool and local_web_extract_tool that mirror the JSON
contracts of web_search_tool / web_extract_tool but use free local-first
backends instead of paid APIs (Firecrawl, Parallel, Tavily, Exa, Gemini).
Search backend chain (auto-fallback):
1. SearXNG self-hosted (default http://localhost:8888)
2. Brave Search free tier (BRAVE_SEARCH_API_KEY)
3. Tavily free tier (TAVILY_API_KEY)
4. ddgr CLI
5. ddgs / duckduckgo_search Python package
Extraction:
- lynx -dump with boilerplate stripping (nav menus, button labels,
iframe markers, captcha blocks, cookie notices)
- Optional Ollama-based summarization (zero API cost)
Drop-in compatible: skills calling web_search/web_extract behave identically
when pointed at local_web_search/local_web_extract.
Self-test: python3 -m tools.local_web_tools (smoke test included).
Closes free-tier gap for users without paid web-API keys.
Pure markdown methodology skill that teaches the agent to compose web_search, web_extract, and (optionally) delegate into a multi-source research pipeline with strict citation discipline. 5-phase pipeline: 1. Decompose topic into 4-6 concrete sub-questions 2. Fan-out search across sub-questions 3. Fetch promising URLs (selectively) 4. Cross-verify claims across sources; assign confidence stars 5. Synthesize structured report with citations Confidence calibration: ★★★ (3+ sources agree), ★★ (2), ★ (1), ⚠ (sources disagree), ? (inferred). Backend-agnostic: works with paid web_search/web_extract OR free local_web_search/local_web_extract (drop-in, same JSON contract). Anti-fabrication rules baked into prompt + post-process verifier shell helper that flags un-cited numbers and citations not pointing to fetched sources. License: MIT.
…support
Breaking changes from initial commit:
- OLLAMA_URL renamed to LLM_BASE_URL (backward-compat: OLLAMA_URL still honored)
- Added LLM_DEFAULT_MODEL env var
- Renamed _summarize_ollama() to _summarize_via_local_llm()
New behavior:
- Auto-detect Qwen3.5/3.6 model tags via _is_qwen35_or_36(); when detected,
automatically pass chat_template_kwargs.enable_thinking=false in the request
payload. Critical because Qwen3.5/3.6 default to thinking mode and do NOT
honor the /think /no_think directives that worked on Qwen3.
- Now compatible with any OpenAI-compat /v1/chat/completions endpoint:
* Ollama (default http://localhost:11434)
* llama.cpp (llama-server, default http://localhost:8080)
* vLLM (default http://localhost:8000)
* LM Studio (default http://localhost:1234)
- Replaced legacy duckduckgo_search package with new ddgs name; falls back
to legacy package for backward compat.
Validated end-to-end against Qwen3.5-4B-Q4_K_M via llama-server b9010
(May 2026 release) with --jinja flag — produces valid tool-call sequences and
clean cited research reports.
Refs Qwen3.5 model card guidance:
https://huggingface.co/Qwen/Qwen3.5-9B
…st-class support
SKILL.md additions (243 → 378 lines):
1. Quickstart section (5-command local-first stack):
- Install llama.cpp prebuilt binary
- Pull Qwen3.5/3.6 GGUF from unsloth
- Boot llama-server via the new helper script
- (optional) self-host SearXNG for free web search
- Configure LLM_BASE_URL + SEARXNG_URL env vars
Total cost: zero per query, no paid API keys.
2. Recommended models — Qwen3.5/3.6 first-class:
- Qwen3.5-4B (~2.5 GB Q4 — sweet spot for 6 GB GPUs / CPU)
- Qwen3.5-9B (~5.5 GB Q4 — single-GPU quality)
- Qwen3.5-27B (~16 GB)
- Qwen3.6-27B (Apr 2026, latest dense)
- Qwen3.6-35B-A3B (MoE, best speed/quality)
- Qwen3.5-122B-A10B (multi-GPU, frontier-class)
3. Critical operational notes:
- Qwen3.5/3.6 do NOT honor /think /no_think directives (Qwen3 only)
- Disable thinking via chat_template_kwargs.enable_thinking=false
- Tool-call parser is qwen3_coder for vLLM/SGLang
- Per-mode sampling profiles (instruct vs thinking) from the model card
4. Multilingual research subsection — Qwen3.5/3.6 covers 201 languages.
5. New helper script scripts/start-llama-server.sh:
- Auto-detects Qwen3.5/3.6 from filename
- Sane defaults (port 8088, ctx 16384, --jinja)
- Configurable via PORT/CTX/THREADS/N_GPU_LAYERS env vars
- Friendly error if llama-server not on PATH
Validated end-to-end with Qwen3.5-4B-Q4_K_M on llama-server b9010 — agent loop
produces valid cited research reports with the new tool/skill stack.
|
Related: #13412 (open PR also adding a deep-research skill). This PR additionally introduces tools/local_web_tools.py for free-tier web search/extract. |
|
Related: #13412 |
|
Thanks for the pointer @alt-glitch — wasn't aware of #13412 when I drafted this. Reading it now I can see @vominh1919 got there first on the methodology, and the structure is genuinely cleaner with Honest split of overlap vs additive in this PR vs #13412: Overlapping — both add Additive in this PR (not in #13412):
Proposed path forward (deferring to maintainers): If you'd prefer one cohesive deep-research PR, I'm happy to close this and re-open a tool-only PR containing just Or if you'd rather merge #13412 first then revisit any cherry-picks from this, that's also fine — happy to defer. Either way, thanks for the fast review. |
|
Closing this in favor of #19607 — the tool-only split, as offered above. #19607 contains exactly:
Zero changes under Thanks for the pointer @alt-glitch — split was the right call. |
Summary
A self-contained, zero paid-API contribution that turns Hermes into a citation-disciplined research agent on the latest open-source stack. Two pieces, one branch, no modifications to existing code:
1.
tools/local_web_tools.py— free-tier counterpart toweb_tools.pyHermes' existing
web_search/web_extractrely on Firecrawl, Parallel, Tavily, Exa, Gemini — all paid. This file mirrors their JSON contracts using free local-first backends:$LLM_BASE_URL2.
skills/research/deep-research/— methodology skillPure markdown skill teaching a 5-phase research pipeline with strict citation discipline:
Backend-agnostic — works equally with paid
web_search/web_extractOR the new freelocal_web_search/local_web_extract.Anti-fabrication built in: every quantitative claim needs
[n]citations, Open-Questions section is mandatory, post-process verifier (shell helper in SKILL.md) flags un-cited numbers and dangling references.Multi-backend LLM support (
$LLM_BASE_URL)Configurable, no code changes:
First-class Qwen3.5 / Qwen3.6 support
The recommended local stack uses the latest Qwen open-source releases (Apache 2.0):
Qwen3.5-4BQwen3.5-9BQwen3.5-27BQwen3.6-27BQwen3.6-35B-A3BCritical operational note baked into the SKILL.md and tool: Qwen3.5/3.6 do NOT honor
/think/no_thinkdirectives the way Qwen3 did. Per the official model card, the only reliable way to disable thinking ischat_template_kwargs.enable_thinking=falseat the request level.local_web_tools.pyauto-detects Qwen3.5/3.6 model tags via_is_qwen35_or_36()and applies this flag automatically.Includes
scripts/start-llama-server.sh— turnkey launcher that auto-detects Qwen3.5/3.6 from the GGUF filename and applies sane defaults (--jinja, port 8088, ctx 16384, configurableN_GPU_LAYERS).Quickstart (5 commands, zero paid keys)
Validation
End-to-end integration test against Qwen3.5-4B-Q4_K_M on llama-server b9010 (May 2 2026 release):
local_web_searchreturns valid schema (SearXNG)local_web_extractcleaned 1468 chars from real URLStep 6 of the agent loop hit the integration-test 180s timeout under CPU prompt-eval on accumulated context — production deployment with GPU acceleration or a more generous timeout completes the full synthesis pass. The plumbing of all four pieces is fully validated.
Why this isn't a duplication
web_tools.pyis excellent for users with paid keys. This PR makes Hermes Agent's web research fully zero-cost for users who self-host. Same JSON contracts mean any skill callingweb_search/web_extractworks identically with the local variants — no skill-side changes.Files
tools/local_web_tools.py(552 lines)skills/research/deep-research/SKILL.md(378 lines)skills/research/deep-research/scripts/start-llama-server.sh(106 lines)Total: 1,036 lines additive, zero modifications to existing files. MIT license, no new dependencies (uses existing
requests, optionallynx/ddgr/ddgsalready common in research environments).Commits
6ff296afeat(tools): add local_web_tools — free-tier counterpart to web_tools7ff5fb8feat(tools/local_web_tools): first-class Qwen3.5/3.6 + multi-backend supportfac1501feat(skills/research): add deep-research methodology skillbcdb4f0feat(skills/research/deep-research): full Qwen3.5/3.6 + llama.cpp first-class supportHappy to split into separate tool + skill PRs if maintainers prefer reviewing them independently.