feat(tools): local_web_tools — free-tier web search/extract + llama-server launcher by Abd0r · Pull Request #19607 · NousResearch/hermes-agent

Abd0r · 2026-05-04T09:02:30Z

Summary

This is the tool-only split of #19341 (which I'm closing in favor of this). #19341 also added a skills/research/deep-research/SKILL.md that overlapped with @vominh1919's existing #13412; that overlap is removed here so this PR can land independently of any deep-research methodology decision.

What's left in this PR is purely additive infrastructure — a free-tier counterpart to web_tools.py plus a turnkey llama.cpp launcher. Useful on its own; composes with whichever deep-research skill (or other research tool) ends up shipping.

What this PR adds

tools/local_web_tools.py (552 lines) — drop-in free-tier counterpart to web_tools.py:

Same JSON contract — local_web_search / local_web_extract are interchangeable with web_search / web_extract.
Backend chain: SearXNG → Brave Search free tier → Tavily free tier → ddgr → ddgs. Fails over cleanly.
First-class Qwen3.5 / Qwen3.6 support — auto-detects via _is_qwen35_or_36() and applies chat_template_kwargs={"enable_thinking": false} because per the official Qwen3.5 model card these models do not honor /think /no_think the way Qwen3 did — only the chat-template flag works.
Multi-backend: $LLM_BASE_URL works for Ollama / llama.cpp's llama-server / vLLM / LM Studio interchangeably.
Self-test: python3 -m tools.local_web_tools (smoke).

scripts/start-llama-server.sh (109 lines) — turnkey llama.cpp launcher:

Auto-detects Qwen3.5/3.6 from GGUF filename.
Sane defaults (--jinja, ctx 16384, port 8088).
nproc fallback for non-Linux (macOS).
Friendly errors when GGUF missing / llama-server not on PATH (with install hints for Linux/macOS/pip).
Standalone — no skill or tool dependency.

Why split

#19341 bundled this with a deep-research skill. @alt-glitch correctly pointed out the skill overlapped with @vominh1919's #13412 (open since 2026-04-21). Splitting lets the tool + launcher land on their own merits, and lets #13412's methodology PR proceed without coordination overhead. If @vominh1919 wants to lift the Qwen3.5/3.6 client-side notes into their SKILL.md after this lands, I'm happy to send a small follow-up; otherwise the operational quirks live cleanly in tools/local_web_tools.py itself.

Related issues

This PR closes three open feature requests by shipping their requested backends, and partially addresses two more.

Closes (auto-close on merge):

Closes Feature: Add Brave Search as a native web search backend #10644 — Brave Search as a native web search backend. local_web_tools.py includes Brave free tier in its fallback chain.
Closes [Feature]: SearXNG as a saerch engine #9959 — SearXNG as a search engine. local_web_tools.py has SearXNG as the first backend in its chain.
Closes [Feature]: Add Searxng as a default web search provider (alongside firecrawl, tavily, etc) #5941 — SearXNG as a default web search provider (alongside firecrawl, tavily, etc.). Same scope as [Feature]: SearXNG as a saerch engine #9959.

Addresses (does not auto-close):

[Feature]: Support a configurable custom JSON search backend for web_search #10284 — configurable custom JSON search backend for web_search. local_web_tools.py ships a multi-backend fallback chain (SearXNG → Brave free → Tavily free → ddgr → ddgs) all sharing the same JSON contract as web_search. Doesn't add a user-configurable arbitrary JSON endpoint, but the multi-backend infrastructure is there.
Feature: Local Model Setup Skill — Ollama, llama.cpp & vLLM Configuration Guide with Model Recommendations (inspired by Liquid AI LocalCowork) #523 — Local Model Setup Skill (Ollama / llama.cpp / vLLM). scripts/start-llama-server.sh provides the llama.cpp piece with auto-detected Qwen3.5/3.6 setup, sane defaults, and friendly errors. Ollama and vLLM remain out of scope here.

If maintainers prefer a different close/keep-open call on any of these, happy to adjust.

Two implementation options — maintainers' choice

This PR currently ships Option A, which is the lower-risk drop-in. Option B is functionally equivalent but a cleaner long-term design. Happy to refactor on request.

Option A — parallel module (this PR as-is)

New file tools/local_web_tools.py (552 lines), new tools local_web_search / local_web_extract.
Zero changes to existing tools/web_tools.py.
Diff: +661 / -0, 2 new files.
Pros: zero risk to paid web_search/web_extract users; trivial to audit; trivial to revert.
Cons: two tools where one would do; doesn't fully match the integrate-into-canonical asks in [Feature]: Add Searxng as a default web search provider (alongside firecrawl, tavily, etc) #5941, [Feature]: SearXNG as a saerch engine #9959, Feature: Add Brave Search as a native web search backend #10644.

Option B — integrate into existing `web_tools.py` (also open: #19796)

Add searxng / brave-free / ddgs / lynx as new candidates in the existing _get_backend() priority chain.
No new tool surface — web_search / web_extract gain free-tier fallbacks transparently.
Also adds the three providers to the hermes setup / hermes tools interactive selector via hermes_cli/tools_config.py.
Diff: +295 / -1 across 3 files (1 new launcher + 2 modified).
Pros: one tool with more backends; closes [Feature]: Add Searxng as a default web search provider (alongside firecrawl, tavily, etc) #5941 / [Feature]: SearXNG as a saerch engine #9959 / Feature: Add Brave Search as a native web search backend #10644 "as written".
Cons: touches a 2,153-line hot file; review surface is the modified file.

Reviewers can pick whichever of #19607 / #19796 is cleaner to merge; the other will be closed as superseded.

Files changed

tools/local_web_tools.py — new (552)
scripts/start-llama-server.sh — new (109, executable)

No existing files modified.

Test plan

Validated on two platforms with the same llama.cpp build (b9010, May 2026 release):

Ubuntu 24.04 (x86_64, RTX 4050 Laptop GPU)

Self-test: python3 -m tools.local_web_tools
Smoke against SearXNG public instance + Brave free key + ddgs fallback
llama-server boot via scripts/start-llama-server.sh ~/models/Qwen3.5-4B-Q4_K_M.gguf — serves on http://127.0.0.1:8088/v1/chat/completions
End-to-end agent loop with Hermes pointing at LLM_BASE_URL=http://127.0.0.1:8088 produces valid cited research reports

macOS 26 Tahoe (Apple Silicon M2)

brew install llama.cpp → llama-server resolves on PATH
scripts/start-llama-server.sh boots Qwen3.5-4B-Q4_K_M cleanly on Metal backend (3.7 GB GPU memory) — server listening within 19s
/v1/models and /props both respond; chat template loads with thinking=1 (auto-detected from Qwen3.5 GGUF)
nproc fallback path exercised (macOS lacks nproc; THREADS defaults to 8 per script)
python3 -m tools.local_web_tools smoke on macOS — ddg-python backend (via pip-installed ddgs) returned 3 real DuckDuckGo results; lynx extraction path exercised (brew install lynx).
End-to-end Hermes agent loop with custom llamacpp_local provider (api: http://127.0.0.1:8089/v1, transport: openai_chat):
- Single-turn chat completion roundtrip via hermes chat -q '...' --provider llamacpp_local -m Qwen3.5-4B-Q4_K_M.gguf -Q produces valid completion (session opens, response returned, session closes cleanly)
- Hermes's built-in llama.cpp auto-detection works: /v1/models emits owned_by: llamacpp; /props reports correct n_ctx
Direct curl POST to /v1/chat/completions with tool_choice: "required" + chat_template_kwargs: {enable_thinking: false} → Qwen3.5-4B-Q4_K_M emits proper structured tool_calls JSON (verified the model handles tool calling correctly when the chat-template flag is passed)

CI: will fix anything pytest tests/ flags.

License

MIT (auto per CONTRIBUTING.md).

Adds local_web_search_tool and local_web_extract_tool that mirror the JSON contracts of web_search_tool / web_extract_tool but use free local-first backends instead of paid APIs (Firecrawl, Parallel, Tavily, Exa, Gemini). Search backend chain (auto-fallback): 1. SearXNG self-hosted (default http://localhost:8888) 2. Brave Search free tier (BRAVE_SEARCH_API_KEY) 3. Tavily free tier (TAVILY_API_KEY) 4. ddgr CLI 5. ddgs / duckduckgo_search Python package Extraction: - lynx -dump with boilerplate stripping (nav menus, button labels, iframe markers, captcha blocks, cookie notices) - Optional Ollama-based summarization (zero API cost) Drop-in compatible: skills calling web_search/web_extract behave identically when pointed at local_web_search/local_web_extract. Self-test: python3 -m tools.local_web_tools (smoke test included). Closes free-tier gap for users without paid web-API keys.

…support Breaking changes from initial commit: - OLLAMA_URL renamed to LLM_BASE_URL (backward-compat: OLLAMA_URL still honored) - Added LLM_DEFAULT_MODEL env var - Renamed _summarize_ollama() to _summarize_via_local_llm() New behavior: - Auto-detect Qwen3.5/3.6 model tags via _is_qwen35_or_36(); when detected, automatically pass chat_template_kwargs.enable_thinking=false in the request payload. Critical because Qwen3.5/3.6 default to thinking mode and do NOT honor the /think /no_think directives that worked on Qwen3. - Now compatible with any OpenAI-compat /v1/chat/completions endpoint: * Ollama (default http://localhost:11434) * llama.cpp (llama-server, default http://localhost:8080) * vLLM (default http://localhost:8000) * LM Studio (default http://localhost:1234) - Replaced legacy duckduckgo_search package with new ddgs name; falls back to legacy package for backward compat. Validated end-to-end against Qwen3.5-4B-Q4_K_M via llama-server b9010 (May 2026 release) with --jinja flag — produces valid tool-call sequences and clean cited research reports. Refs Qwen3.5 model card guidance: https://huggingface.co/Qwen/Qwen3.5-9B

Companion script to tools/local_web_tools.py. Boots llama.cpp's llama-server with the correct flags for OpenAI-compatible local inference, with first-class Qwen3.5/3.6 detection from the GGUF filename. Defaults: - port 8088, ctx 16384, threads = nproc - --jinja (required for Qwen3.5/3.6 chat-template + tool calling) - --n-gpu-layers 0 (CPU; override via N_GPU_LAYERS=-1 for all-on-GPU) Detects Qwen3.5/3.6 from filename and prints the required client-side flag (chat_template_kwargs.enable_thinking=false) per the official model card — since Qwen3.5+ default to thinking mode and ignore /think /no_think. Useful out of the box for any Hermes tool that talks to an OpenAI-compatible endpoint (point LLM_BASE_URL at http://127.0.0.1:8088). Friendly errors when: - GGUF path missing or invalid - llama-server not on PATH (with install hints for Linux/macOS/pip) Standalone — no skill or tool dependency. MIT.

llama.cpp's `llama-server` already speaks OpenAI chat-completions, so users could already point Hermes at it via `--provider custom`. But "custom" means they have to set OPENAI_BASE_URL by hand, the model picker doesn't list it, and the dashboard has no way to surface the running server. This PR makes llama.cpp a discoverable, zero-config backend. What ships ========== * `plugins/model-providers/llama-cpp/` — new ProviderProfile with default base_url `http://127.0.0.1:8088/v1`, aliases `llamacpp` / `llama.cpp` / `llama_cpp` / `llama-server`, and an offline-tolerant fetch_models override (returns None instead of raising when the local server is down). * `hermes_cli/auth.py` — adds llama-cpp to PROVIDER_REGISTRY (modeled on the lmstudio entry: api_key auth_type with optional LLAMA_CPP_API_KEY + LLAMA_CPP_BASE_URL env vars). Removes the old llama.cpp/llamacpp/llama-cpp hardcoded aliases that pointed at `custom`, so the plugin's aliases win. * `hermes_cli/models.py` — adds the same alias mappings to _PROVIDER_ALIASES so `--provider llama.cpp` resolves correctly through the CLI parser path. * `hermes_cli/model_switch.py` — adds a probe-and-surface block in list_authenticated_providers, mirroring the existing lmstudio pattern. Three surfacing modes: 1. Live probe: `${LLAMA_CPP_BASE_URL}/models` with a 300 ms cold-discovery timeout. If `llama-server` responds, the row appears with the loaded model. This is what makes the dashboard "magically" pick up a running server with no config. 2. Hint mode: LLAMA_CPP_API_KEY or LLAMA_CPP_BASE_URL set, or current provider matches one of the aliases — 1.5 s timeout. 3. Sticky current: when llama-cpp is the user's selected provider but the server is offline, the row still appears with current_model so the user doesn't lose access after restart. When no env vars, no current selection, and no live server, the row is not injected — keeps the picker tidy for non-llama.cpp users. * `plugins/model-providers/custom/__init__.py` — drops the llamacpp / llama.cpp / llama-cpp aliases from the generic `custom` profile (they now belong to the dedicated provider). * `scripts/start-llama-server.sh` — turnkey llama-server launcher whose default port (8088) lines up with the plugin's default base_url, so the end-to-end UX is just: ./scripts/start-llama-server.sh ~/models/foo.gguf hermes chat --provider llama-cpp Prints an alignment hint when PORT/HOST diverge from the plugin default. * `tests/providers/test_llama_cpp_profile.py` — 12 tests covering plugin registration, alias resolution end-to-end through hermes_cli.auth, CANONICAL_PROVIDERS auto-injection, PROVIDER_REGISTRY entry shape, picker surfacing in three modes (current+offline, no-clutter, alias resolution), and the offline-graceful fetch_models override. * `tests/providers/test_plugin_discovery.py` — bumped expected profile count 33 → 34. * `website/docs/guides/local-llamacpp-setup.md` — user-facing setup guide modeled on the existing local-ollama-setup.md. * `website/docs/reference/environment-variables.md` — documents LLAMA_CPP_API_KEY / LLAMA_CPP_BASE_URL and adds llama-cpp to the HERMES_INFERENCE_PROVIDER accepted-values list. Test plan ========= pytest tests/providers/ # 90 passed pytest tests/providers/test_llama_cpp_profile.py -v # 12 passed pytest tests/hermes_cli/test_model_switch_custom_providers.py \ tests/hermes_cli/test_user_providers_model_switch.py \ tests/hermes_cli/test_custom_provider_model_switch.py \ tests/hermes_cli/test_api_key_providers.py \ tests/hermes_cli/test_auth_provider_gate.py # 221 passed Tested on macOS 26.4 (arm64). The launcher uses `nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 8` so it works on Linux + macOS + WSL2; not exercised on native Windows. Notes ===== * Existing PR NousResearch#19607 also adds `scripts/start-llama-server.sh`. The version in this PR supersedes that one — it's stripped of the Qwen-specific detection branches (this PR is intentionally generic-llama.cpp only) and reworded around the new `llama-cpp` provider's defaults. Whichever PR lands second will need a one-line conflict resolution. * Does not include `tools/local_web_tools.py` — that's orthogonal web-search work and remains in NousResearch#19607.

Abd0r and others added 3 commits May 4, 2026 14:28

Abd0r mentioned this pull request May 4, 2026

feat(research): deep-research methodology + local_web_tools (free-tier, Qwen3.5/3.6 + llama.cpp first-class) #19341

Closed

alt-glitch added type/feature New feature or request comp/tools Tool registry, model_tools, toolsets tool/web Web search and extraction P3 Low — cosmetic, nice to have labels May 4, 2026

Abd0r mentioned this pull request May 4, 2026

feat(tools/web): add searxng, brave-free, ddgs as fallback backends #19796

Closed

4 tasks

This was referenced May 5, 2026

Feature request: Configurable extra_body per provider (enable_thinking=false for DashScope) #8160

Open

feat(tools/wot_engine): add Web-of-Thought multi-agent reasoning #20158

Open

feat(web): add Brave Search (free tier) and DDGS search providers #21337

Merged

Abd0r closed this May 7, 2026

Abd0r reopened this May 7, 2026

Abd0r closed this May 7, 2026

Abd0r mentioned this pull request May 7, 2026

feat(agent): first-class llama.cpp provider with zero-config dashboard surfacing #21531

Open

15 tasks

kjames2001 mentioned this pull request May 11, 2026

feat: add SearXNG self-hosted search backend #23618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): local_web_tools — free-tier web search/extract + llama-server launcher#19607

feat(tools): local_web_tools — free-tier web search/extract + llama-server launcher#19607
Abd0r wants to merge 3 commits into
NousResearch:mainfrom
Abd0r:feat/local-web-tools-only

Abd0r commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abd0r commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR adds

Why split

Related issues

Two implementation options — maintainers' choice

Option A — parallel module (this PR as-is)

Option B — integrate into existing web_tools.py (also open: #19796)

Files changed

Test plan

License

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Abd0r commented May 4, 2026 •

edited

Loading

Option B — integrate into existing `web_tools.py` (also open: #19796)