Skip to content

refactor(web): per-capability backend selection for search/extract split#20061

Merged
kshitijk4poor merged 1 commit into
mainfrom
refactor/web-tools-provider-architecture
May 6, 2026
Merged

refactor(web): per-capability backend selection for search/extract split#20061
kshitijk4poor merged 1 commit into
mainfrom
refactor/web-tools-provider-architecture

Conversation

@kshitijk4poor

@kshitijk4poor kshitijk4poor commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add per-capability backend selection for web tools — the foundational architecture that enables independently choosing search and extract backends.

No new backends are added in this PR. SearXNG, native HTTP extract, and other search-only/extract-only providers are follow-up PRs that become trivial drop-ins with this architecture in place.

Ref: #19198

What this PR does

  1. ABCsWebSearchProvider and WebExtractProvider in tools/web_providers/base.py, mirroring the existing CloudBrowserProvider pattern. Normalized result contracts documented in docstrings.

  2. Per-capability backend selection_get_search_backend() and _get_extract_backend() in web_tools.py. Each reads its own config key first, then falls through to the shared web.backend key (backward compatible).

  3. Wiringweb_search_tool() now dispatches via _get_search_backend() and web_extract_tool() via _get_extract_backend(). When per-capability keys are empty (default), behavior is identical to before.

  4. Config keysweb.search_backend and web.extract_backend added to DEFAULT_CONFIG. Empty by default = inherit from web.backend.

  5. Architecture doctools/web_providers/ARCHITECTURE.md explains the system, how to add providers, and the UX design for hermes tools.

Config example

# Existing configs continue working unchanged:
web:
  backend: "firecrawl"

# Future: mix providers per capability (follow-up PRs add the backends)
web:
  search_backend: "searxng"
  extract_backend: "firecrawl"

UX Design: hermes tools Provider Picker

The picker uses progressive disclosure — the default path stays unchanged:

hermes tools → Web Search & Extract → Select provider
  → Pick "Firecrawl" → sets web.backend. Done. (same as today)

  → Pick "⚙️ Advanced: configure separately" (last option)
    → "Search backend:" [SearXNG / Firecrawl / ...]
    → "Extract backend:" [Firecrawl / Tavily / Native / ...]
    → Done. (sets web.search_backend + web.extract_backend)

Nobody writes "firecrawl" twice. The split only appears when explicitly requested. Full UX design documented in #19198.

Files changed (6 files, +409/-5)

File Change
tools/web_providers/__init__.py New — package init
tools/web_providers/base.py New — ABCs (90 lines)
tools/web_providers/ARCHITECTURE.md New — developer guide
tools/web_tools.py +43 — _get_search_backend(), _get_extract_backend(), wire to dispatch
hermes_cli/config.py +8 — web section in DEFAULT_CONFIG
tests/tools/test_web_providers.py New — 12 tests

Test plan

# New tests (12 pass)
python -m pytest tests/tools/test_web_providers.py -v

# Existing web tools tests unaffected (49 pass)
python -m pytest tests/tools/test_web_tools_config.py -v

What this enables (follow-up PRs)

  • SearXNGtools/web_providers/searxng.py implementing WebSearchProvider
  • Native HTTPtools/web_providers/native.py implementing WebExtractProvider
  • hermes tools Advanced picker — two-step sub-picker for split configuration
  • DuckDuckGo, Brave Search — more search-only providers
  • Extract existing vendors into modules — Firecrawl/Tavily/Exa/Parallel as provider classes

Related

@kshitijk4poor kshitijk4poor added type/feature New feature or request type/refactor Code restructuring, no behavior change comp/tools Tool registry, model_tools, toolsets labels May 5, 2026
@kshitijk4poor kshitijk4poor force-pushed the refactor/web-tools-provider-architecture branch from 48b6b70 to c150bd4 Compare May 5, 2026 04:58
@alt-glitch alt-glitch added P3 Low — cosmetic, nice to have tool/web Web search and extraction area/config Config system, migrations, profiles labels May 5, 2026
@kshitijk4poor kshitijk4poor force-pushed the refactor/web-tools-provider-architecture branch from c150bd4 to bc3b2b0 Compare May 5, 2026 05:03
@kshitijk4poor kshitijk4poor changed the title refactor(web): capability-based provider architecture + SearXNG backend refactor(web): per-capability backend selection for search/extract split May 5, 2026
@kshitijk4poor kshitijk4poor force-pushed the refactor/web-tools-provider-architecture branch 2 times, most recently from f86f92e to bf4e502 Compare May 6, 2026 05:00
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.

Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
  ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
  read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
  DEFAULT_CONFIG (empty = inherit from web.backend)

Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
  to before — _get_search_backend() falls through to _get_backend()

This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.

12 new tests, 49 existing tests pass with zero regressions.

Ref: #19198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have tool/web Web search and extraction type/feature New feature or request type/refactor Code restructuring, no behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants