Skip to content

feat(tool_search): optional embedding reranker for progressive tool disclosure#35457

Open
davidgut1982 wants to merge 1 commit into
NousResearch:mainfrom
davidgut1982:feat/tool-search-hybrid-rerank
Open

feat(tool_search): optional embedding reranker for progressive tool disclosure#35457
davidgut1982 wants to merge 1 commit into
NousResearch:mainfrom
davidgut1982:feat/tool-search-hybrid-rerank

Conversation

@davidgut1982

@davidgut1982 davidgut1982 commented May 30, 2026

Copy link
Copy Markdown
Contributor

What

Adds optional embedding-based reranker for semantic tool discovery on top of BM25 lexical search. When enabled, all tool descriptions are embedded once per process using nomic-embed-text-v2-moe (MD5-cached), then per-query tool candidates are reranked by cosine similarity. Implements progressive tool disclosure: when a profile exceeds the activation threshold, the full catalog (~54k tokens) is deferred behind tool_search stubs (~2.3k tokens) and tools are fetched on demand. Two reranking modes: pure cosine or Reciprocal Rank Fusion (RRF k=10).

Files modified: tools/tool_search.py (reranker + progressive disclosure), tests/tools/test_tool_search.py (new tests), website/docs/user-guide/features/tool-search.md (updated).

Why

BM25 lexical matching fails on semantic queries ("remind me tonight" vs "create_calendar_event"). Embedding reranker recovers those cases. Large tool catalogs consume 34-67% of a 131k context window. Progressive disclosure defers the catalog and reduces visible tools from 226 → 4, freeing 95.8% of tool-definition tokens.

Tests

pytest tests/tools/test_tool_search.py -v

53 tests pass: BM25 fallback, RRF exact-score, limit contract, dimension-mismatch, prefix payload, cache invalidation. Offline eval suite shows R@5 improvement from 0.634 (BM25) → 0.810 (with reranker).

Platforms tested

Linux (CT/LXC environment, Python 3.13)

@alt-glitch alt-glitch added type/feature New feature or request comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have labels May 30, 2026
@davidgut1982

Copy link
Copy Markdown
Contributor Author

Per-Scope Cache Improvement Added

Cherry-picked commit 09d86e6 (fix(tool_search): per-scope reranker cache) onto this PR. This adds critical multi-agent support:

  • Replaces single-slot module-level reranker singleton with a bounded scope-keyed cache (max 8 entries, FIFO eviction)
  • Each distinct toolset-scope (keyed by md5(endpoint + model + tool_names)) retains its own EmbeddingReranker instance + embedding cache
  • Key benefit: Concurrent sub-agents operating on different toolsets no longer evict each other's embedding cache, eliminating redundant endpoint calls

New test coverage:

  • TestEmbedCacheInvalidation.test_concurrent_scopes_do_not_share_reranker — proves scope B creation does NOT evict scope A's instance or cache
  • TestEmbedCacheInvalidation.test_reranker_cache_evicts_oldest_scope_when_full — validates FIFO eviction when cache is full

This is essential for orchestrator patterns where multiple concurrent agents with different MCP toolsets need to avoid thrashing the embedding endpoint.

…isclosure

Adds an optional, opt-in embedding reranker to the tool_search BM25 bridge
(PR NousResearch#34493). Default OFF — when disabled the BM25 path is byte-for-byte
identical to upstream. urllib-only (no new deps), task-prefixed, md5-cached
tool embeddings, full-catalog retrieve, rerank/RRF(k=10) modes, graceful
BM25 fallback on any endpoint failure. Backend is any OpenAI-compatible
/v1/embeddings endpoint (cloud, local CPU, or GPU).

Live-validated (194 tools / 98 labeled queries, nomic-embed-text-v2-moe):
overall Recall@5 0.617 -> 0.810, SEMANTIC 0.500 -> 0.849, LEXICAL preserved
at 1.000; warm per-query ~146ms, dead-endpoint fallback ~8ms.

Fulfills NousResearch#13332.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants