feat(tools): progressive tool disclosure for MCP and plugin tools (scoped) by teknium1 · Pull Request #34493 · NousResearch/hermes-agent

teknium1 · 2026-05-29T08:22:22Z

Summary

Salvages #31163 (Tool Search — progressive tool disclosure for MCP/plugin tools) onto current main and closes a toolset-scoping hole found in deep review.

Tool Search hides MCP + non-core plugin tools behind three bridge tools (tool_search / tool_describe / tool_call) when the deferrable surface exceeds ~10% of the active model's context window. Core Hermes tools are never deferred.

What changed vs #31163

The original bridge dispatch read its catalog from the global registry — get_tool_definitions() with no toolset scope, whose else branch is "start with everything." In a restricted-toolset session (subagent, kanban worker, curated gateway session) that meant the model could:

tool_search the entire process registry, not just its granted tools, and
tool_call any registered plugin/MCP tool it was never given — registry.dispatch() has no enabled_tools gate for non-execute_code tools, so the out-of-scope tool actually ran.

It also widened the process-global _last_resolved_tool_names to the whole registry on every tool_search, leaking core/sandbox tools into execute_code's fallback set.

Confirmed by live E2E (pre-fix)

Session scoped to enabled_toolsets=['mcp-github'] with an out-of-scope dangerplugin tool registered:

tool_search reported total_available: 26 (whole registry)
tool_call("secret_plugin_danger", {}) returned {"ok": true} — it dispatched a tool the session was never granted
_last_resolved_tool_names went 20 → 51 (now including terminal)

Fix

handle_function_call gains enabled_toolsets / disabled_toolsets; the bridge dispatch scopes get_tool_definitions to them. This both scopes the searchable catalog and stops the global-pollution side effect.
Defense-in-depth gate rejects any tool_call'd name not in the scoped deferrable catalog.
tool_executor's unwrap (concurrent + sequential paths) enforces the same scope before dispatch — it unwraps tool_call → underlying name and bypasses the bridge branch, so the gate must live there too. New _tool_search_scoped_names() helper, cached per-agent on registry generation + toolset scope.
New scoped_deferrable_names() helper in tool_search.py shared by both sites.
get_tool_definitions / _compute_tool_definitions signatures annotated Optional[List[str]] (were List[str] = None).

Validation (post-fix, same E2E)

	Before	After
`tool_search` scoped to `mcp-github`	`total_available: 26`	`20`
`tool_call(out-of-scope plugin)`	`{"ok": true}` (ran)	rejected: "not available in this session"
`tool_call(in-scope tool)`	ran	ran
`_last_resolved_tool_names` after `tool_search`	20 → 51 (leaked `terminal`)	20 → 20

Changes

tools/tool_search.py (new) — classification, threshold gate, BM25 retrieval, bridge dispatch, scoped_deferrable_names().
model_tools.py — assembly wired into _compute_tool_definitions; bridge dispatch in handle_function_call, now toolset-scoped.
agent/tool_executor.py — unwrap tool_call in both parsing paths with the scope gate; _tool_search_scoped_names() cache helper.
agent/agent_runtime_helpers.py — forwards toolset scope into the sequential dispatch.
hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
tests/tools/test_tool_search.py — 39 tests (35 original + 4-test TestRegression_ToolsetScoping).
website/docs/user-guide/features/tool-search.md — docs incl. the scoping guarantee.

Test plan

scripts/run_tests.sh tests/tools/test_tool_search.py        → 39/39
scripts/run_tests.sh tests/test_model_tools.py tests/test_toolsets.py \
  tests/tools/test_registry.py tests/hermes_cli/test_config.py \
  tests/run_agent/test_tool_arg_coercion.py                 → 269/269 (combined)
scripts/run_tests.sh tests/run_agent/test_agent_guardrails.py \
  tests/run_agent/test_concurrent_interrupt.py \
  tests/run_agent/test_tool_call_guardrail_runtime.py \
  tests/run_agent/test_tool_executor_contextvar_propagation.py → 52/52

Supersedes #31163.

Infographic

Adds Tool Search, a structured-tools progressive-disclosure layer that replaces MCP and non-core plugin tools in the model-visible tools array with three bridge tools (tool_search / tool_describe / tool_call) when the deferrable surface would consume more than a configurable percentage of the active model's context window. Core Hermes tools are never deferred. Default mode is 'auto' with a 10% context threshold, so small toolsets pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off' to disable. Design carefully reflects the OpenClaw production failure modes documented in the openclaw-tool-search-report: - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the 'tools silently missing from isolated cron turns' regression class (openclaw#84141) by construction: there is no code path that can drop a core tool. - Catalog is stateless across turns — rebuilt from the live tool-defs list on every assembly. No session-keyed Map that can drift out of sync with the registry. - tool_call unwraps the bridge call before any hook fires, so plugin pre/post hooks, guardrails, approval flows, and the activity feed all see the underlying tool name, not the bridge (addresses openclaw#85588 and the verbose-mode complaint on openclaw#79823). - The unwrap happens in both the parallel and sequential paths of agent/tool_executor.py and also in handle_function_call, so direct callers (sandboxed code, eval harnesses) are covered too. - Bridge tools cannot invoke each other (recursion guard) and cannot invoke core tools (those must be called directly). - Tools mode only — no JS-sandbox code-mode. Keeps the surface small. - Token estimation via cheap char/4 heuristic; precision isn't needed for the threshold decision. Files: - tools/tool_search.py — new module (BM25 retrieval, classification, threshold gate, bridge dispatch, unwrap helper). - tests/tools/test_tool_search.py — 35 tests including the OpenClaw #84141 regression guard. - model_tools.py — wires assembly into _compute_tool_definitions as the final step, adds skip_tool_search_assembly kwarg so the bridge can see the real catalog, dispatches the three bridge tools. - agent/tool_executor.py — unwraps tool_call in both parallel and sequential parsing loops so checkpointing, guardrails, plugin hooks, and tool-progress callbacks all observe the underlying tool name. - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block. - website/docs/user-guide/features/tool-search.md — user docs. Validation: - 35/35 new tests pass. - Existing tool/registry/model_tools/config/coercion/executor tests (82 + 74 + small adjacents) green. - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns 3 bridges, tool_search returns top 3 hits, tool_describe returns full schema, tool_call dispatches to the real underlying handler and the underlying result is what the model sees. - Reserved-name recursion guard verified live. - Core-tool refusal via tool_call verified live.

…olsets Tool Search read its catalog from the global registry (get_tool_definitions with no toolset scope = 'start with everything'), so a restricted-toolset session — subagent, kanban worker, curated gateway session — could: 1. tool_search the entire process registry, not just its granted tools, and 2. tool_call any registered plugin/MCP tool it was never given, because registry.dispatch() has no enabled_tools gate for non-execute_code tools. A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26 and successfully invoked an out-of-scope plugin tool via tool_call. Fix: - handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge dispatch scopes get_tool_definitions to them (also stops polluting the process-global _last_resolved_tool_names with out-of-scope tools, which leaked into execute_code's sandbox-tool fallback). - A defense-in-depth gate rejects any tool_call'd name not in the scoped deferrable catalog. - tool_executor's unwrap (both concurrent + sequential paths) enforces the same scope before dispatch, since it unwraps tool_call -> underlying name and bypasses the bridge branch. New _tool_search_scoped_names() helper, cached per-agent on registry generation + toolset scope. - New scoped_deferrable_names() helper in tool_search.py shared by both sites. Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped catalog, out-of-scope tool_call rejection, no global pollution, helper).

github-actions · 2026-05-29T08:23:15Z

🔎 Lint report: `hermes/hermes-ede5b5b2` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9436 on HEAD, 9439 on base (✅ -3)

🆕 New issues (5):

Rule	Count
`invalid-assignment`	3
`invalid-argument-type`	1
`unresolved-import`	1

First entries

scripts/tool_search_livetest.py:389: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `list[str]`, found `None`
model_tools.py:850: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `<module 'tools.tool_search'>`
scripts/tool_search_livetest.py:377: [invalid-assignment] invalid-assignment: Object of type `def logging_dispatch(name, args, **kw) -> Unknown` is not assignable to attribute `dispatch` of type `def dispatch(self, name: str, args: dict[Unknown, Unknown], **kwargs) -> str`
tests/tools/test_tool_search.py:15: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
scripts/tool_search_livetest.py:412: [invalid-assignment] invalid-assignment: Object of type `bound method ToolRegistry.dispatch(name: str, args: dict[Unknown, Unknown], **kwargs) -> str` is not assignable to attribute `dispatch` of type `def dispatch(self, name: str, args: dict[Unknown, Unknown], **kwargs) -> str`

✅ Fixed issues (4):

Rule	Count
`invalid-argument-type`	3
`invalid-parameter-default`	1

First entries

model_tools.py:331: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `list[str]`
acp_adapter/server.py:798: [invalid-argument-type] invalid-argument-type: Argument to function `get_tool_definitions` is incorrect: Expected `list[str]`, found `Any | None`
gateway/run.py:13573: [invalid-argument-type] invalid-argument-type: Argument to function `get_tool_definitions` is incorrect: Expected `list[str]`, found `Any | None`
tui_gateway/server.py:6700: [invalid-argument-type] invalid-argument-type: Argument to function `get_tool_definitions` is incorrect: Expected `list[str]`, found `Any | None | list[str]`

Unchanged: 4894 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Brings in the tool_search live-test harness from the original PR but leaves out the 11 checked-in scripts/out/*.json transcript files — those are non-deterministic model output that goes stale the moment the model changes and were the bulk of the diff. scripts/out/ is now gitignored so a harness run never re-commits them. Fixes on top: - API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv instead of hand-parsing ~/.hermes/.env and assigning the value to a local. The canonical loader never materializes the secret in a local variable in this module, which clears the four CodeQL high alerts (py/clear-text-storage / py/clear-text-logging-sensitive-data at the transcript write/print sites — they were tracing the key from the hand-rolled parser into the records) and removes a hand-rolled parser. - encoding='utf-8' on every write_text/read_text in both harness scripts (Windows-footgun hygiene). Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>

…args The scoping fix added enabled_toolsets/disabled_toolsets to the agent_runtime_helpers sequential dispatch into handle_function_call, so test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with (exact match) needs the two new kwargs. Both are None for the default agent fixture.

The live harness runs against a real OpenRouter key; record['error'] is a full traceback that, on an auth failure, could echo a request header or URL containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY, any sk-/sk-or- bearer token, and Authorization/Bearer headers before final_response and error enter the transcript or the console print. Addresses the CodeQL clear-text-storage/logging findings at the source.

…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.

teknium1 added 2 commits May 29, 2026 00:38

alt-glitch added type/security Security vulnerability or hardening P2 Medium — degraded but workaround exists comp/tools Tool registry, model_tools, toolsets comp/agent Core agent loop, run_agent.py, prompt builder tool/mcp MCP client and OAuth labels May 29, 2026

github-advanced-security AI found potential problems May 29, 2026

View reviewed changes

Comment thread scripts/tool_search_livetest.py Dismissed

Comment thread scripts/tool_search_livetest.py Dismissed

Comment thread scripts/tool_search_livetest.py Dismissed

Comment thread scripts/tool_search_livetest.py Dismissed

teknium1 merged commit a87f0a8 into main May 29, 2026
26 checks passed

teknium1 deleted the hermes/hermes-ede5b5b2 branch May 29, 2026 09:04

teknium1 mentioned this pull request May 29, 2026

feat(tools): progressive tool disclosure for MCP and plugin tools #31163

Closed

davidgut1982 mentioned this pull request May 30, 2026

feat(tool_search): optional embedding reranker for progressive tool disclosure #35457

Open

gal-checksum mentioned this pull request Jun 2, 2026

[codex] docs: add Tool Search to sidebar #37512

Closed

gal064 mentioned this pull request Jun 2, 2026

[codex] docs: add Tool Search to sidebar #37514

Open

BrewTestBot mentioned this pull request Jun 6, 2026

hermes-agent 2026.6.5 Homebrew/homebrew-core#286569

Merged

1 task

github-actions Bot mentioned this pull request Jun 6, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.29.2 to v2026.6.5 Docker-Hub-sirmark/docker-hermes-agent#9

Merged

davidgut1982 mentioned this pull request Jun 11, 2026

feat(tool_search): optional embedding reranker for progressive tool disclosure #44272

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): progressive tool disclosure for MCP and plugin tools (scoped)#34493

feat(tools): progressive tool disclosure for MCP and plugin tools (scoped)#34493
teknium1 merged 5 commits into
mainfrom
hermes/hermes-ede5b5b2

teknium1 commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed vs #31163

Confirmed by live E2E (pre-fix)

Fix

Validation (post-fix, same E2E)

Changes

Test plan

Infographic

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: hermes/hermes-ede5b5b2 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

teknium1 commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-ede5b5b2` vs `origin/main`