Skip to content

feat(web): Add support for SearXNG search and native backend to reduce dependency on third party APIs#2710

Closed
StreamOfRon wants to merge 4 commits into
NousResearch:mainfrom
StreamOfRon:web-config-schema-v2
Closed

feat(web): Add support for SearXNG search and native backend to reduce dependency on third party APIs#2710
StreamOfRon wants to merge 4 commits into
NousResearch:mainfrom
StreamOfRon:web-config-schema-v2

Conversation

@StreamOfRon

Copy link
Copy Markdown
Contributor

Summary

Splits the monolithic web.backend config key into separate web.search.backend and web.extract.backend keys, allowing independent control of which provider handles search vs. content extraction. Also adds two new backends: SearXNG (self-hosted search, no API key) and Native HTTP (direct extraction with html-to-markdown, no API key).

Includes several bug fixes to the LLM post-processing pipeline discovered during local testing.

Changes

New backends

  • SearXNG (search) — Self-hosted OpenSearch-compatible search. Configured via web.search.url in config.yaml or SEARXNG_URL env var. Requires no API key.
  • Native HTTP (extract) — Direct HTTP fetch with HTML-to-Markdown conversion. Requires no API key or external service.

Configuration schema (v10 → v11)

web:
  backend: "firecrawl"        # Existing key — still works as fallback
  search:
    backend: "searxng"        # New: override search backend independently
    url: "https://searx.example.com/search?q=%s&format=json"
  extract:
    backend: "native"         # New: override extract backend independently

Selection priority for each tool call:

  1. Tool-specific config (web.search.backend / web.extract.backend)
  2. Generic fallback (web.backend)
  3. Legacy environment variables

Existing configs are automatically migrated from v10 to v11. The web.backend key continues to work as a fallback, so no breaking changes.

Bug fixes (discovered during local testing)

  1. _call_summarizer_llm permanent error short-circuit — errors like no endpoints available, 404, and guardrail rejections (e.g. from OpenRouter) now bail out immediately instead of retrying for ~62 seconds (2+4+8+16+32s backoff).

  2. Primary model fallback — on a permanent aux LLM error, retries once with the user's configured primary model (_read_main_model()) before giving up. Ensures summarization works even when the default auxiliary model is unavailable or restricted.

  3. process_single_result safety net — any unhandled summarization exception now falls back to returning raw extracted content rather than failing the entire tool call. The user gets unsummarized-but-valid content instead of an error.

  4. web_extract_tool outer exception logging — promoted from DEBUG to WARNING so tool-level failures are visible in ~/.hermes/logs/errors.log.

  5. _native_extract SSL error surfacing — SSL failures now log at WARNING and return a clear diagnostic message per-URL instead of silently returning empty content.

Files changed

File Change
tools/web_tools.py _get_search_backend(), _get_extract_backend(), _searxng_search(), _native_extract(), updated check_web_api_key(); bug fixes to _call_summarizer_llm, process_single_result, web_extract_tool
hermes_cli/config.py Updated DEFAULT_CONFIG, v10→v11 migration, SearXNG env var metadata
hermes_cli/tools_config.py SearXNG and Native HTTP options in provider UI
pyproject.toml Added html-to-markdown dependency
tests/tools/test_web_config_v2.py 58 new unit tests
tests/hermes_cli/test_config_v2_migration.py 14 new unit tests
website/docs/user-guide/configuration.md Expanded API keys table, new Web Tools Configuration section
website/docs/user-guide/features/tools.md New Web Tools section with examples

Testing

72 new unit tests covering: backend selection precedence, SearXNG URL encoding, native extraction, check_web_api_key() with all backend combinations, config migration, and schema validation. Full suite passing with no regressions.

@StreamOfRon StreamOfRon changed the title feat(web): granular search/extract backend configuration (v2) feat(web): granular search/extract backend configuration Mar 24, 2026
@StreamOfRon StreamOfRon force-pushed the web-config-schema-v2 branch 2 times, most recently from 6aad651 to a08fa56 Compare March 26, 2026 05:21
@StreamOfRon StreamOfRon changed the title feat(web): granular search/extract backend configuration feat(web): Add support for SearXNG search and native backend to reduce dependency on third party APIs Mar 26, 2026
@StreamOfRon StreamOfRon force-pushed the web-config-schema-v2 branch 3 times, most recently from 3d00a58 to 4c03d43 Compare March 29, 2026 01:36
…iguration

- Split web.backend into web.search.backend (with URL) and web.extract.backend
- Add SearXNG as search backend with URL template support
- Add 'native' extract backend for direct HTTP requests with html-to-markdown
- Implement _get_search_backend() and _get_extract_backend() precedence functions
- Add v10→v11 config migration with interactive SearXNG URL prompt
- Update tools_config.py with SearXNG and Native HTTP provider options
- Add html-to-markdown as core dependency
- Update check_web_api_key() to handle native backend (no API key required)
- All tests passing (2 pre-existing failures unrelated to this change)

Plan: .opencode/plans/1774319582434-stellar-lagoon.md
Test coverage includes:
- _get_search_backend() — 29 tests covering tool-specific config, generic fallback, env detection
- _get_extract_backend() — 21 tests with same precedence logic minus searxng
- _searxng_search() — 8 tests for URL template substitution, error handling, limits, API keys
- _native_extract() — 4 structural tests (async function verification)
- check_web_api_key() — 5 tests for native backend (no key) and SearXNG support
- Backend compatibility — 2 tests for _get_backend() alias
- Config migration v10→v11 — 7 tests for automatic splitting, SearXNG prompting
- Config schema — 6 tests for DEFAULT_CONFIG web keys
- Environment variables — 5 tests for SEARXNG_URL and SEARXNG_API_KEY

Total: 72 new tests, all passing. Full suite: 6152 passed, 1 pre-existing failure.

Plan: .opencode/plans/1774319582434-stellar-lagoon.md
Remove:
- .opencode/plans/ (plan documents)
- .opencode/web-config-v2-todos.md (todo tracking)
- docs/superpowers/plans/ (documentation)
- test_results.txt (test output)
@StreamOfRon StreamOfRon force-pushed the web-config-schema-v2 branch from 93b37fd to 7addb38 Compare March 30, 2026 17:43
kshitijk4poor pushed a commit that referenced this pull request Apr 17, 2026
Adds SearXNG (https://docs.searxng.org) as a self-hosted, privacy-first
web search backend alongside Firecrawl, Tavily, Exa, and Parallel.

SearXNG is a meta-search engine that aggregates results from 70+ search
engines. No API key needed -- just set SEARXNG_URL to your instance.

Changes:
- tools/web_tools.py: _get_searxng_url(), _searxng_search(), search
  dispatch, extract falls back to Firecrawl (SearXNG is search-only)
- hermes_cli/tools_config.py: SearXNG provider in web tool picker
- hermes_cli/config.py: SEARXNG_URL env var, diagnostics, set command
- tests/tools/test_web_tools_searxng.py: 15 tests
- optional-skills/research/searxng-search/: agent-guided skill
- Docs: configuration.md, environment-variables.md, skills catalogs

Based on #6071 by @gnanam1990, #8106 by @cro, #2572 by @bhovig,
#2710 and #9961 by @StreamOfRon, #7258 by @coldxiangyu163
@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Merged via PR #11562 which consolidates SearXNG integration from multiple community PRs. Your skill and documentation structure ideas informed the optional skill shipped with the final implementation. Thank you for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants