fix(mcp_semantic_tool_filter): match tools with client-side namespace prefix (#26078)#26117
Conversation
a client-side namespace prefix. `SemanticMCPToolFilter._get_tools_by_names` matched by exact equality between the canonical name stored in the router (`<server><MCP_TOOL_PREFIX_SEPARATOR><tool>`) and the name in the incoming `tools[]` list. MCP clients such as opencode wrap every tool name with their own additive alias prefix (`<client_alias>_<canonical>`), so the two never matched, the filter dropped every tool to zero, and the proxy forwarded `tools: []` with `tool_choice: auto` — which strict upstream providers reject with a 400. The fix adds anchored suffix matching with a separator check: the canonical must form the complete tail of the incoming name and be preceded by `_` or `-`. Exact matches still win over suffix matches, incoming tools are returned at most once, and the original tool object is passed through unchanged so the client-facing name survives for tool-call round-trips. Seven unit tests in a new TestGetToolsByNames class cover exact match, underscore- and dash-prefixed variants, non-separator-anchored suffixes (which must not match), exact-wins-over-prefixed precedence, deduplication when two canonicals suffix-match the same incoming tool, and ordering-follows-router-output. Fixes BerriAI#26078
|
|
Greptile SummaryThis PR fixes Confidence Score: 5/5Safe to merge; the fix is narrow, well-tested, and all remaining findings are P2 style suggestions. The core matching logic is correct (separator-anchored suffix, gate on MCP_TOOL_PREFIX_SEPARATOR presence, shortest-prefix tie-breaker, id-based dedup). Previously flagged concerns about the deduplication test and tie-breaker ambiguity have been addressed. The only remaining finding is that one test doesn't fully exercise its stated invariant — not a code defect. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py | Adds _name_matches_canonical static helper and rewrites _get_tools_by_names to tolerate client-side namespace prefixes via separator-anchored suffix matching with deduplication and shortest-prefix tie-breaking. Logic is correct and well-documented. |
| tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py | Adds 8 new unit tests for _get_tools_by_names; previously-flagged deduplication test now properly uses two distinct canonicals, and a new shortest-prefix test covers the tie-breaker. One test (test_exact_match_preferred_over_prefixed) doesn't fully exercise its stated invariant due to canonical 'search' having no separator. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_get_tools_by_names(tool_names, available_tools)"] --> B["Build available_by_name index\n(client_name → tool object)"]
B --> C["For each canonical from router"]
C --> D{"Exact match?\navailable_by_name.get(canonical)"}
D -- Yes --> E["tool = exact match"]
D -- No --> F["Suffix scan:\nfor client_name in available_by_name"]
F --> G{"_name_matches_canonical\n(client_name, canonical)?"}
G -- No --> F
G -- Yes --> H{"len(client_name) < len(best_name)?"}
H -- Yes --> I["best_name = client_name"]
H -- No --> F
I --> F
F --> J["tool = available_by_name[best_name]"]
E --> K{"id(tool) in used_ids?"}
J --> K
K -- No --> L["append tool\nadd id to used_ids"]
K -- Yes --> M["skip (dedup)"]
L --> C
M --> C
Reviews (4): Last reviewed commit: "Merge branch 'litellm_oss_staging_04_22_..." | Re-trigger Greptile
| def test_same_tool_not_returned_twice(self): | ||
| """ | ||
| Two canonicals that both suffix-match the same incoming tool | ||
| must not produce a duplicate in the output list. | ||
| """ | ||
| filter_instance = self._make_filter() | ||
| available_tools = [ | ||
| {"name": "srv_read_file", "description": "read"} | ||
| ] | ||
|
|
||
| matched = filter_instance._get_tools_by_names( | ||
| ["read_file", "read_file"], available_tools | ||
| ) | ||
|
|
||
| assert len(matched) == 1 |
There was a problem hiding this comment.
Test doesn't exercise its own comment
The docstring says "Two canonicals that both suffix-match the same incoming tool," but the actual call passes ["read_file", "read_file"] — the same canonical twice. That only exercises the used_ids guard for duplicate canonical names, not the case where two distinct canonicals both suffix-match a single incoming tool. A canonical list like ["read_file", "file"] (both suffix-match "srv_read_file") would properly cover the stated scenario and give stronger regression protection.
| def test_same_tool_not_returned_twice(self): | |
| """ | |
| Two canonicals that both suffix-match the same incoming tool | |
| must not produce a duplicate in the output list. | |
| """ | |
| filter_instance = self._make_filter() | |
| available_tools = [ | |
| {"name": "srv_read_file", "description": "read"} | |
| ] | |
| matched = filter_instance._get_tools_by_names( | |
| ["read_file", "read_file"], available_tools | |
| ) | |
| assert len(matched) == 1 | |
| def test_same_tool_not_returned_twice(self): | |
| """ | |
| Two canonicals that both suffix-match the same incoming tool | |
| must not produce a duplicate in the output list. | |
| """ | |
| filter_instance = self._make_filter() | |
| available_tools = [ | |
| {"name": "srv_read_file", "description": "read"} | |
| ] | |
| matched = filter_instance._get_tools_by_names( | |
| ["read_file", "file"], available_tools | |
| ) | |
| assert len(matched) == 1 |
| for canonical in tool_names: | ||
| tool = available_by_name.get(canonical) | ||
| if tool is None: | ||
| for client_name, candidate in available_by_name.items(): | ||
| if self._name_matches_canonical(client_name, canonical): | ||
| tool = candidate | ||
| break |
There was a problem hiding this comment.
First-in-dict-order wins when multiple tools suffix-match the same canonical
When no exact match is found and more than one available tool suffix-matches the same canonical (e.g. "my_tag_search" and "my_search" both matching canonical "search"), the suffix fallback returns whichever tool appears first in available_by_name's insertion order. In the opencode scenario described in the PR this is never ambiguous (one client prefix → one canonical), but if the proxy is ever used with a client that prefixes tools from multiple namespaces the wrong tool could be silently selected. Returning the shortest-prefix match (i.e. len(client_name) is minimised) would be a more principled tie-breaker with negligible overhead.
deduplication regression test (Greptile comments on BerriAI#26117) - test_same_tool_not_returned_twice now passes two distinct canonicals ("read_file" and "file") that both suffix-match the same incoming tool, rather than the same canonical twice, so the assertion actually exercises the used_ids dedup path instead of the duplicate-input-list path. - The suffix fallback in _get_tools_by_names now prefers the shortest incoming name that still qualifies under the separator-anchored match. In the one-prefix-per-client opencode scenario this is a no-op, but in multi-namespace configurations the shortest qualifying name is the least-wrapped one and is the most defensible deterministic choice, replacing the dict-insertion-order fallback. - Adds test_suffix_fallback_prefers_shortest_candidate covering the new tie-breaker directly. Still 15 tests passing locally (was 14).
|
Thanks for the review @greptile-apps! Both P2 points addressed in 930187e: 1. 2. Suffix-match tie-breaker — swapped dict-insertion-order for shortest-qualifying-name. Rationale is the same as you sketched: the incoming name closest in length to the canonical is the least-wrapped and the most defensible deterministic choice in multi-namespace configs. Added Local run: 15/15 passing. |
| tool = available_by_name[best_name] | ||
| if tool is not None and id(tool) not in used_ids: | ||
| matched.append(tool) | ||
| used_ids.add(id(tool)) |
There was a problem hiding this comment.
will there be collisions if a local function matches a litellm mcp server tool?
or is this handled since litellm always prefixes it's mcp tools with the server name in front of it?
There was a problem hiding this comment.
Good catch — yes, collision was possible on unprefixed canonical names. Addressed in ac7df0e:
Since LiteLLM always registers server-side MCP tools as <server_name><MCP_TOOL_PREFIX_SEPARATOR><tool_name> via add_server_prefix_to_name, a canonical that does not contain MCP_TOOL_PREFIX_SEPARATOR is not a namespaced MCP tool, so I gated the suffix-match fallback on the separator being present in the canonical. Added test_does_not_collide_with_local_function_on_unprefixed_canonical that spuriously matched before the fix and passes after.
The opencode scenario is unaffected because its canonical (fc_web_search-firecrawl_scrape) contains the separator. A few pre-existing TestGetToolsByNames fixtures that used bare canonicals for brevity (get_weather, search, etc.) were switched to realistic server-prefixed canonicals so they continue to exercise the suffix-fallback path under the new guard.
…CP_TOOL_PREFIX_SEPARATOR @krrish-berri-2 flagged a possible collision in the suffix fallback: a local user function whose name happens to end in a bare canonical substring (e.g. my_firecrawl_scrape vs canonical firecrawl_scrape) would be spuriously selected. Server-registered MCP tools are always emitted as <server_name><MCP_TOOL_PREFIX_SEPARATOR><tool_name> via add_server_prefix_to_name, so a canonical without the separator is not a namespaced MCP tool and does not warrant suffix matching. Added that guard to _name_matches_canonical with a regression test (test_does_not_collide_with_local_function_on_unprefixed_canonical) that reproduces the collision before the fix and is pinned after. Pre-existing TestGetToolsByNames fixtures that relied on bare canonicals (get_weather, search, read_file, write/delete/read) were switched to realistic server-prefixed ones so they continue to exercise the suffix-fallback path under the new guard. The opencode scenario (client prefix on already-server-prefixed canonical) is unchanged.
034f4fd
into
BerriAI:litellm_oss_staging_04_22_2026
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 29203053 | Triggered | Generic Password | f8eb263 | .circleci/config.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Fixes #26078.
Re-opening against
litellm_oss_branchafter the main-branch guard rejected the fork-based PR (#26116 was closed for that reason). Same commit, now on the correct base.What was broken
SemanticMCPToolFilter._get_tools_by_namesmatched the canonical name stored in the router (e.g.fc_web_search-firecrawl_scrape) against the incomingtools[].namewith plain equality. MCP clients that wrap every canonical name with their own additive namespace prefix — notably opencode (mcp/index.tsregisters every tool as<client_alias>_<canonical>) — never matched, the filter returned[], the proxy forwardedtools: []withtool_choice: \"auto\", and strict OpenAI-compatible upstreams rejected the request:This is distinct from:
Here the mismatch is a client-side prefix layered on top of the already-prefixed canonical name.
Fix
Anchored suffix matching in
_get_tools_by_nameswith a separator-character check.A new static helper
_name_matches_canonical(client_name, canonical)returnsTruewhen:client_name == canonical, orclient_name.endswith(canonical)and the character immediately preceding the canonical suffix is_or-.The separator check is what keeps the match semantically correct —
rain_gearis not treated as a namespaced version of canonicalear._get_tools_by_namesnow:Both
_and-are accepted as separators regardless of the proxy'sMCP_TOOL_PREFIX_SEPARATORsetting, because the client has no way to know what the proxy is configured with.Tests
New
TestGetToolsByNamesclass intests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py:test_exact_match_unchangedtest_client_prefix_with_underscore_separatortest_client_prefix_with_dash_separatortest_suffix_without_separator_does_not_matchrain_geardoes not match canonicaleartest_exact_match_preferred_over_prefixedtest_same_tool_not_returned_twicetest_ordering_follows_router_outputpytest tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py— 14 passed (7 new + 7 pre-existing, none regressed).Scope
Kept strictly to the matching bug called out in the issue. The hook's behavior when
filter_toolslegitimately returns[](router finds nothing relevant) is unchanged and can be addressed separately if maintainers want a broader safeguard against forwardingtools: []withtool_choice: \"auto\".