Skip to content

feat(tools): Add support for SearXNG web searches and ScraperMCP extraction#8106

Closed
cro wants to merge 1 commit into
NousResearch:mainfrom
cro:scraper-mcp
Closed

feat(tools): Add support for SearXNG web searches and ScraperMCP extraction#8106
cro wants to merge 1 commit into
NousResearch:mainfrom
cro:scraper-mcp

Conversation

@cro

@cro cro commented Apr 12, 2026

Copy link
Copy Markdown

Add support for SearXNG web searches and ScraperMCP extraction with fallback to ScraperMCP's JS-enable extraction.

What does this PR do?

I kept running into API limits with Firecrawl, and I already had SearXNG and ScraperMCP setup in my environment. This PR adds support for changing Hermes' web search to SearXNG, and extracting results with ScraperMCP.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

tests/tools/test_web_tools_config.py: Add tests for the integration

tools/web_tools.py: Add SearXNG as a valid search backend. Hand link results off to ScraperMCP for extraction.

website/docs/user-guide/configuration.md: Documentation

How to Test

  1. Configure the endpoints for ScraperMCP and SearXNG in config.yaml like so
web:
  backend: searxng
  searxng_url: <url for searxng>
  scrapermcp_url: <url for ScraperMCP mcp endpoint>
  scrapermcp_render_js: false
  1. Restart hermes-gateway
  2. Send Hermes the following prompt
I understand you are now setup to use my SearXNG and ScraperMCP instances for web search.  Can you test this out for me by giving me a quick news summary about the Artemis II mission?
  1. Monitor the ScraperMCP dashboard and see which websites are being scraped. Hermes should return something similar to
Yep — it looks like your SearXNG + ScraperMCP setup is working.

What I verified:
- web_search returned current Artemis II results, including NASA and Space.com.
- web_extract successfully pulled content from NASA and Space.com using ScraperMCP.
- One source (U.S. News) failed even after JS-render fallback, which is actually useful confirmation that the fallback path is being exercised as configured.

Quick Artemis II news summary...
[...deleted for brevity...]

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Debian Trixie

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • [N/A] I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • [N/A] I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

…allback to ScraperMCP's JS-enable extraction.
kshitijk4poor pushed a commit that referenced this pull request Apr 17, 2026
Adds SearXNG (https://docs.searxng.org) as a self-hosted, privacy-first
web search backend alongside Firecrawl, Tavily, Exa, and Parallel.

SearXNG is a meta-search engine that aggregates results from 70+ search
engines. No API key needed -- just set SEARXNG_URL to your instance.

Changes:
- tools/web_tools.py: _get_searxng_url(), _searxng_search(), search
  dispatch, extract falls back to Firecrawl (SearXNG is search-only)
- hermes_cli/tools_config.py: SearXNG provider in web tool picker
- hermes_cli/config.py: SEARXNG_URL env var, diagnostics, set command
- tests/tools/test_web_tools_searxng.py: 15 tests
- optional-skills/research/searxng-search/: agent-guided skill
- Docs: configuration.md, environment-variables.md, skills catalogs

Based on #6071 by @gnanam1990, #8106 by @cro, #2572 by @bhovig,
#2710 and #9961 by @StreamOfRon, #7258 by @coldxiangyu163
@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Merged via PR #11562 which consolidates SearXNG integration from multiple community PRs. Your category support and result formatting ideas were incorporated into the final implementation. Thank you for the contribution!

@cro

cro commented Apr 17, 2026

Copy link
Copy Markdown
Author

Is that PR stuck for some reason? Still shows open.

Also, my PR had code to enable ScraperMCP to extract web content. It doesn't seem to be in the new PR. Should I re-submit my PR with the SearXNG support removed and just the ScraperMCP functionality?

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

its under the review process, once merged you can raise the scraperMCP PR and tag me for review. thanks!

@cro

cro commented Apr 20, 2026

Copy link
Copy Markdown
Author

Greetings @kshitijk4poor , apologies if I don't understand the PR review process, but the pipeline for the PR seems to indicate that the Contributor Attribution check is failing and that's why the PR isn't merged yet.

https://github.com/NousResearch/hermes-agent/actions/runs/24562447269/job/71814165910?pr=11562#step:3:56

venyon2k pushed a commit to venyon2k/hermes-agent that referenced this pull request May 3, 2026
Adds SearXNG (https://docs.searxng.org) as a self-hosted, privacy-first
web search backend alongside Firecrawl, Tavily, Exa, and Parallel.

SearXNG is a meta-search engine that aggregates results from 70+ search
engines. No API key needed -- just set SEARXNG_URL to your instance.

Changes:
- tools/web_tools.py: _get_searxng_url(), _searxng_search(), search
  dispatch, extract falls back to Firecrawl (SearXNG is search-only)
- hermes_cli/tools_config.py: SearXNG provider in web tool picker
- hermes_cli/config.py: SEARXNG_URL env var, diagnostics, set command
- tests/tools/test_web_tools_searxng.py: 15 tests
- optional-skills/research/searxng-search/: agent-guided skill
- Docs: configuration.md, environment-variables.md, skills catalogs

Based on NousResearch#6071 by @gnanam1990, NousResearch#8106 by @cro, NousResearch#2572 by @bhovig,
BestJoester pushed a commit to BestJoester/hermes-agent that referenced this pull request May 8, 2026
Adds SearXNG (https://docs.searxng.org) as a self-hosted, privacy-first
web search backend alongside Firecrawl, Tavily, Exa, and Parallel.

SearXNG is a meta-search engine that aggregates results from 70+ search
engines. No API key needed -- just set SEARXNG_URL to your instance.

Changes:
- tools/web_tools.py: _get_searxng_url(), _searxng_search(), search
  dispatch, extract falls back to Firecrawl (SearXNG is search-only)
- hermes_cli/tools_config.py: SearXNG provider in web tool picker
- hermes_cli/config.py: SEARXNG_URL env var, diagnostics, set command
- tests/tools/test_web_tools_searxng.py: 15 tests
- optional-skills/research/searxng-search/: agent-guided skill
- Docs: configuration.md, environment-variables.md, skills catalogs

Based on NousResearch#6071 by @gnanam1990, NousResearch#8106 by @cro, NousResearch#2572 by @bhovig,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants