refactor(web): per-capability backend selection for search/extract split#20061
Merged
Merged
Conversation
48b6b70 to
c150bd4
Compare
c150bd4 to
bc3b2b0
Compare
f86f92e to
bf4e502
Compare
Contributor
🚨 CRITICAL Supply Chain Risk DetectedThis PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging. 🚨 CRITICAL: Install-hook file added or modifiedThese files can execute code during package installation or interpreter startup. Files: Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting. |
Introduce the foundation for independently selecting web search and extract backends — enabling future combinations like SearXNG for search + Firecrawl for extract. Architecture: - tools/web_providers/base.py: WebSearchProvider and WebExtractProvider ABCs with normalized result contracts (mirrors CloudBrowserProvider) - tools/web_tools.py: _get_search_backend() and _get_extract_backend() read per-capability config keys, fall through to shared web.backend - hermes_cli/config.py: web.search_backend and web.extract_backend in DEFAULT_CONFIG (empty = inherit from web.backend) Behavioral change: - web_search_tool() now dispatches via _get_search_backend() - web_extract_tool() now dispatches via _get_extract_backend() - When per-capability keys are empty (default), behavior is identical to before — _get_search_backend() falls through to _get_backend() This is purely structural — no new backends are added. SearXNG and other search-only/extract-only providers can now be added as simple drop-in modules in follow-up PRs. 12 new tests, 49 existing tests pass with zero regressions. Ref: #19198
bf4e502 to
b373a16
Compare
1 task
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add per-capability backend selection for web tools — the foundational architecture that enables independently choosing search and extract backends.
No new backends are added in this PR. SearXNG, native HTTP extract, and other search-only/extract-only providers are follow-up PRs that become trivial drop-ins with this architecture in place.
Ref: #19198
What this PR does
ABCs —
WebSearchProviderandWebExtractProviderintools/web_providers/base.py, mirroring the existingCloudBrowserProviderpattern. Normalized result contracts documented in docstrings.Per-capability backend selection —
_get_search_backend()and_get_extract_backend()inweb_tools.py. Each reads its own config key first, then falls through to the sharedweb.backendkey (backward compatible).Wiring —
web_search_tool()now dispatches via_get_search_backend()andweb_extract_tool()via_get_extract_backend(). When per-capability keys are empty (default), behavior is identical to before.Config keys —
web.search_backendandweb.extract_backendadded toDEFAULT_CONFIG. Empty by default = inherit fromweb.backend.Architecture doc —
tools/web_providers/ARCHITECTURE.mdexplains the system, how to add providers, and the UX design forhermes tools.Config example
UX Design:
hermes toolsProvider PickerThe picker uses progressive disclosure — the default path stays unchanged:
Nobody writes "firecrawl" twice. The split only appears when explicitly requested. Full UX design documented in #19198.
Files changed (6 files, +409/-5)
tools/web_providers/__init__.pytools/web_providers/base.pytools/web_providers/ARCHITECTURE.mdtools/web_tools.py_get_search_backend(),_get_extract_backend(), wire to dispatchhermes_cli/config.pywebsection in DEFAULT_CONFIGtests/tools/test_web_providers.pyTest plan
What this enables (follow-up PRs)
tools/web_providers/searxng.pyimplementingWebSearchProvidertools/web_providers/native.pyimplementingWebExtractProviderhermes toolsAdvanced picker — two-step sub-picker for split configurationRelated