feat(tools): progressive tool disclosure for MCP and plugin tools (scoped)#34493
Merged
Conversation
Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.
Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.
Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:
- Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
'tools silently missing from isolated cron turns' regression class
(openclaw#84141) by construction: there is no code path that can
drop a core tool.
- Catalog is stateless across turns — rebuilt from the live tool-defs
list on every assembly. No session-keyed Map that can drift out of
sync with the registry.
- tool_call unwraps the bridge call before any hook fires, so plugin
pre/post hooks, guardrails, approval flows, and the activity feed
all see the underlying tool name, not the bridge (addresses
openclaw#85588 and the verbose-mode complaint on openclaw#79823).
- The unwrap happens in both the parallel and sequential paths of
agent/tool_executor.py and also in handle_function_call, so direct
callers (sandboxed code, eval harnesses) are covered too.
- Bridge tools cannot invoke each other (recursion guard) and cannot
invoke core tools (those must be called directly).
- Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
- Token estimation via cheap char/4 heuristic; precision isn't needed
for the threshold decision.
Files:
- tools/tool_search.py — new module (BM25 retrieval, classification,
threshold gate, bridge dispatch, unwrap helper).
- tests/tools/test_tool_search.py — 35 tests including the OpenClaw
#84141 regression guard.
- model_tools.py — wires assembly into _compute_tool_definitions as the
final step, adds skip_tool_search_assembly kwarg so the bridge can
see the real catalog, dispatches the three bridge tools.
- agent/tool_executor.py — unwraps tool_call in both parallel and
sequential parsing loops so checkpointing, guardrails, plugin hooks,
and tool-progress callbacks all observe the underlying tool name.
- hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
- website/docs/user-guide/features/tool-search.md — user docs.
Validation:
- 35/35 new tests pass.
- Existing tool/registry/model_tools/config/coercion/executor tests
(82 + 74 + small adjacents) green.
- Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
3 bridges, tool_search returns top 3 hits, tool_describe returns
full schema, tool_call dispatches to the real underlying handler
and the underlying result is what the model sees.
- Reserved-name recursion guard verified live.
- Core-tool refusal via tool_call verified live.
…olsets
Tool Search read its catalog from the global registry (get_tool_definitions
with no toolset scope = 'start with everything'), so a restricted-toolset
session — subagent, kanban worker, curated gateway session — could:
1. tool_search the entire process registry, not just its granted tools, and
2. tool_call any registered plugin/MCP tool it was never given, because
registry.dispatch() has no enabled_tools gate for non-execute_code tools.
A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26
and successfully invoked an out-of-scope plugin tool via tool_call.
Fix:
- handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge
dispatch scopes get_tool_definitions to them (also stops polluting the
process-global _last_resolved_tool_names with out-of-scope tools, which
leaked into execute_code's sandbox-tool fallback).
- A defense-in-depth gate rejects any tool_call'd name not in the scoped
deferrable catalog.
- tool_executor's unwrap (both concurrent + sequential paths) enforces the
same scope before dispatch, since it unwraps tool_call -> underlying name
and bypasses the bridge branch. New _tool_search_scoped_names() helper,
cached per-agent on registry generation + toolset scope.
- New scoped_deferrable_names() helper in tool_search.py shared by both sites.
Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped
catalog, out-of-scope tool_call rejection, no global pollution, helper).
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
invalid-assignment |
3 |
invalid-argument-type |
1 |
unresolved-import |
1 |
First entries
scripts/tool_search_livetest.py:389: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `list[str]`, found `None`
model_tools.py:850: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `<module 'tools.tool_search'>`
scripts/tool_search_livetest.py:377: [invalid-assignment] invalid-assignment: Object of type `def logging_dispatch(name, args, **kw) -> Unknown` is not assignable to attribute `dispatch` of type `def dispatch(self, name: str, args: dict[Unknown, Unknown], **kwargs) -> str`
tests/tools/test_tool_search.py:15: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
scripts/tool_search_livetest.py:412: [invalid-assignment] invalid-assignment: Object of type `bound method ToolRegistry.dispatch(name: str, args: dict[Unknown, Unknown], **kwargs) -> str` is not assignable to attribute `dispatch` of type `def dispatch(self, name: str, args: dict[Unknown, Unknown], **kwargs) -> str`
✅ Fixed issues (4):
| Rule | Count |
|---|---|
invalid-argument-type |
3 |
invalid-parameter-default |
1 |
First entries
model_tools.py:331: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `list[str]`
acp_adapter/server.py:798: [invalid-argument-type] invalid-argument-type: Argument to function `get_tool_definitions` is incorrect: Expected `list[str]`, found `Any | None`
gateway/run.py:13573: [invalid-argument-type] invalid-argument-type: Argument to function `get_tool_definitions` is incorrect: Expected `list[str]`, found `Any | None`
tui_gateway/server.py:6700: [invalid-argument-type] invalid-argument-type: Argument to function `get_tool_definitions` is incorrect: Expected `list[str]`, found `Any | None | list[str]`
Unchanged: 4894 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
Brings in the tool_search live-test harness from the original PR but leaves out the 11 checked-in scripts/out/*.json transcript files — those are non-deterministic model output that goes stale the moment the model changes and were the bulk of the diff. scripts/out/ is now gitignored so a harness run never re-commits them. Fixes on top: - API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv instead of hand-parsing ~/.hermes/.env and assigning the value to a local. The canonical loader never materializes the secret in a local variable in this module, which clears the four CodeQL high alerts (py/clear-text-storage / py/clear-text-logging-sensitive-data at the transcript write/print sites — they were tracing the key from the hand-rolled parser into the records) and removes a hand-rolled parser. - encoding='utf-8' on every write_text/read_text in both harness scripts (Windows-footgun hygiene). Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
…args The scoping fix added enabled_toolsets/disabled_toolsets to the agent_runtime_helpers sequential dispatch into handle_function_call, so test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with (exact match) needs the two new kwargs. Both are None for the default agent fixture.
The live harness runs against a real OpenRouter key; record['error'] is a full traceback that, on an auth failure, could echo a request header or URL containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY, any sk-/sk-or- bearer token, and Authorization/Bearer headers before final_response and error enter the transcript or the console print. Addresses the CodeQL clear-text-storage/logging findings at the source.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
May 30, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
May 31, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 2, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 4, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 5, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 5, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 6, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 6, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 6, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
davidgut1982
added a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 6, 2026
…isclosure Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge (PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible /v1/embeddings endpoint (cloud, local CPU, or GPU). Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe): overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms. Fulfills NousResearch#13332.
1 task
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Salvages #31163 (Tool Search — progressive tool disclosure for MCP/plugin tools) onto current
mainand closes a toolset-scoping hole found in deep review.Tool Search hides MCP + non-core plugin tools behind three bridge tools (
tool_search/tool_describe/tool_call) when the deferrable surface exceeds ~10% of the active model's context window. Core Hermes tools are never deferred.What changed vs #31163
The original bridge dispatch read its catalog from the global registry —
get_tool_definitions()with no toolset scope, whoseelsebranch is "start with everything." In a restricted-toolset session (subagent, kanban worker, curated gateway session) that meant the model could:tool_searchthe entire process registry, not just its granted tools, andtool_callany registered plugin/MCP tool it was never given —registry.dispatch()has noenabled_toolsgate for non-execute_codetools, so the out-of-scope tool actually ran.It also widened the process-global
_last_resolved_tool_namesto the whole registry on everytool_search, leaking core/sandbox tools intoexecute_code's fallback set.Confirmed by live E2E (pre-fix)
Session scoped to
enabled_toolsets=['mcp-github']with an out-of-scopedangerplugintool registered:tool_searchreportedtotal_available: 26(whole registry)tool_call("secret_plugin_danger", {})returned{"ok": true}— it dispatched a tool the session was never granted_last_resolved_tool_nameswent 20 → 51 (now includingterminal)Fix
handle_function_callgainsenabled_toolsets/disabled_toolsets; the bridge dispatch scopesget_tool_definitionsto them. This both scopes the searchable catalog and stops the global-pollution side effect.tool_call'd name not in the scoped deferrable catalog.tool_executor's unwrap (concurrent + sequential paths) enforces the same scope before dispatch — it unwrapstool_call→ underlying name and bypasses the bridge branch, so the gate must live there too. New_tool_search_scoped_names()helper, cached per-agent on registry generation + toolset scope.scoped_deferrable_names()helper intool_search.pyshared by both sites.get_tool_definitions/_compute_tool_definitionssignatures annotatedOptional[List[str]](wereList[str] = None).Validation (post-fix, same E2E)
tool_searchscoped tomcp-githubtotal_available: 2620tool_call(out-of-scope plugin){"ok": true}(ran)tool_call(in-scope tool)_last_resolved_tool_namesaftertool_searchterminal)Changes
tools/tool_search.py(new) — classification, threshold gate, BM25 retrieval, bridge dispatch,scoped_deferrable_names().model_tools.py— assembly wired into_compute_tool_definitions; bridge dispatch inhandle_function_call, now toolset-scoped.agent/tool_executor.py— unwraptool_callin both parsing paths with the scope gate;_tool_search_scoped_names()cache helper.agent/agent_runtime_helpers.py— forwards toolset scope into the sequential dispatch.hermes_cli/config.py—DEFAULT_CONFIG['tools']['tool_search']block.tests/tools/test_tool_search.py— 39 tests (35 original + 4-testTestRegression_ToolsetScoping).website/docs/user-guide/features/tool-search.md— docs incl. the scoping guarantee.Test plan
Supersedes #31163.
Infographic