fix(mcp_semantic_tool_filter): match tools with client-side namespace prefix (#26078)#26116
fix(mcp_semantic_tool_filter): match tools with client-side namespace prefix (#26078)#26116sakenuGOD wants to merge 1 commit into
Conversation
a client-side namespace prefix. `SemanticMCPToolFilter._get_tools_by_names` matched by exact equality between the canonical name stored in the router (`<server><MCP_TOOL_PREFIX_SEPARATOR><tool>`) and the name in the incoming `tools[]` list. MCP clients such as opencode wrap every tool name with their own additive alias prefix (`<client_alias>_<canonical>`), so the two never matched, the filter dropped every tool to zero, and the proxy forwarded `tools: []` with `tool_choice: auto` — which strict upstream providers reject with a 400. The fix adds anchored suffix matching with a separator check: the canonical must form the complete tail of the incoming name and be preceded by `_` or `-`. Exact matches still win over suffix matches, incoming tools are returned at most once, and the original tool object is passed through unchanged so the client-facing name survives for tool-call round-trips. Seven unit tests in a new TestGetToolsByNames class cover exact match, underscore- and dash-prefixed variants, non-separator-anchored suffixes (which must not match), exact-wins-over-prefixed precedence, deduplication when two canonicals suffix-match the same incoming tool, and ordering-follows-router-output. Fixes BerriAI#26078
|
|
Greptile SummaryThis PR fixes Confidence Score: 5/5Safe to merge — the fix is narrowly scoped to name-matching logic, all remaining findings are P2 style suggestions, and 14 tests pass with no regressions. The implementation is correct: the separator check prevents false positives, deduplication via object identity is safe because available_by_name holds live references, and exact-match-first ensures backward compatibility. No P0/P1 issues found. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py | Adds _name_matches_canonical static helper and rewrites _get_tools_by_names with exact-then-suffix matching, deduplication via id(), and preserved client-facing names. Logic is correct and well-documented. |
| tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py | Adds 7 targeted unit tests covering exact match, underscore/dash prefix, non-separator suffix rejection, exact-wins-over-prefix, dedup, and ordering. All tests are pure mock tests with no real network calls. Missing coverage for two distinct canonicals matching the same tool. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_get_tools_by_names(tool_names, available_tools)"] --> B["Build available_by_name index\n(client_name → tool object)"]
B --> C["For each canonical in tool_names"]
C --> D{"Exact lookup\navailable_by_name.get(canonical)"}
D -- "Found" --> G
D -- "Not found" --> E["Iterate available_by_name\ncall _name_matches_canonical(client_name, canonical)"]
E --> F{"Suffix + separator\nmatch found?"}
F -- "Yes" --> G{"id(tool) already\nin used_ids?"}
F -- "No" --> H["Skip canonical"]
G -- "No" --> I["Append tool to matched\nAdd id(tool) to used_ids"]
G -- "Yes (dedup)" --> H
I --> C
H --> C
C -- "All canonicals done" --> J["Return matched (router-ordered,\nclient-facing names preserved)"]
Reviews (1): Last reviewed commit: "fix(mcp_semantic_tool_filter): match can..." | Re-trigger Greptile
| def test_same_tool_not_returned_twice(self): | ||
| """ | ||
| Two canonicals that both suffix-match the same incoming tool | ||
| must not produce a duplicate in the output list. | ||
| """ | ||
| filter_instance = self._make_filter() | ||
| available_tools = [ | ||
| {"name": "srv_read_file", "description": "read"} | ||
| ] | ||
|
|
||
| matched = filter_instance._get_tools_by_names( | ||
| ["read_file", "read_file"], available_tools | ||
| ) | ||
|
|
||
| assert len(matched) == 1 |
There was a problem hiding this comment.
Dedup test only covers duplicate canonicals, not distinct canonicals
test_same_tool_not_returned_twice exercises the case where the same canonical appears twice in tool_names. The more realistic dedup scenario is two different canonicals that both suffix-match the same incoming tool (e.g. canonicals ["read_file", "file"] against client tool "srv_read_file"). Consider adding a second assertion case to give the id()-based guard full coverage:
def test_same_tool_not_returned_twice_distinct_canonicals(self):
"""Two *different* canonicals that suffix-match the same incoming tool."""
filter_instance = self._make_filter()
available_tools = [{"name": "srv_read_file", "description": "read"}]
matched = filter_instance._get_tools_by_names(
["read_file", "file"], available_tools
)
assert len(matched) == 1| matched: List[Any] = [] | ||
| used_ids: set = set() | ||
| for canonical in tool_names: | ||
| tool = available_by_name.get(canonical) | ||
| if tool is None: | ||
| for client_name, candidate in available_by_name.items(): | ||
| if self._name_matches_canonical(client_name, canonical): | ||
| tool = candidate | ||
| break |
There was a problem hiding this comment.
Linear scan per unmatched canonical is O(n×m) at worst
For each canonical that misses the exact-lookup, the code iterates over all entries in available_by_name until a suffix match is found — and stops at the first hit (break). In typical MCP deployments (dozens to low hundreds of tools) this is fine, but if the registry ever grows to thousands of tools the per-request cost will be noticeable. A pre-built suffix index keyed on canonical → matching client tool could eliminate the scan, but that would complicate the code significantly. Adding a short comment noting the complexity trade-off would help future readers make an informed decision if they revisit this path.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Closing: CI guard requires external PRs to target |
|
Re-opened against |
Fixes #26078.
What was broken
SemanticMCPToolFilter._get_tools_by_namesmatched the canonical name stored in the router (e.g.fc_web_search-firecrawl_scrape) against the incomingtools[].namewith plain equality. MCP clients that wrap every canonical name with their own additive namespace prefix — notably opencode (mcp/index.tsregisters every tool as<client_alias>_<canonical>) — never matched, the filter returned[], the proxy forwardedtools: []withtool_choice: \"auto\", and strict OpenAI-compatible upstreams rejected the request:This is distinct from:
Here the mismatch is a client-side prefix layered on top of the already-prefixed canonical name.
Fix
Anchored suffix matching in
_get_tools_by_nameswith a separator-character check.A new static helper
_name_matches_canonical(client_name, canonical)returnsTruewhen:client_name == canonical, orclient_name.endswith(canonical)and the character immediately preceding the canonical suffix is_or-.The separator check is what keeps the match semantically correct —
rain_gearis not treated as a namespaced version of canonicalear._get_tools_by_namesnow:Both
_and-are accepted as separators regardless of the proxy'sMCP_TOOL_PREFIX_SEPARATORsetting, because the client has no way to know what the proxy is configured with.Tests
New
TestGetToolsByNamesclass intests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py:test_exact_match_unchangedtest_client_prefix_with_underscore_separatortest_client_prefix_with_dash_separatortest_suffix_without_separator_does_not_matchrain_geardoes not match canonicaleartest_exact_match_preferred_over_prefixedtest_same_tool_not_returned_twicetest_ordering_follows_router_outputpytest tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py— 14 passed (7 new + 7 pre-existing, none regressed).Scope
Kept strictly to the matching bug called out in the issue. The hook's behavior when
filter_toolslegitimately returns[](router finds nothing relevant) is unchanged and can be addressed separately if maintainers want a broader safeguard against forwardingtools: []withtool_choice: \"auto\".