Skip to content

fix(mcp_semantic_tool_filter): match tools with client-side namespace prefix (#26078)#26116

Closed
sakenuGOD wants to merge 1 commit into
BerriAI:mainfrom
sakenuGOD:fix/26078-mcp-semantic-filter-client-prefix
Closed

fix(mcp_semantic_tool_filter): match tools with client-side namespace prefix (#26078)#26116
sakenuGOD wants to merge 1 commit into
BerriAI:mainfrom
sakenuGOD:fix/26078-mcp-semantic-filter-client-prefix

Conversation

@sakenuGOD

Copy link
Copy Markdown
Contributor

Fixes #26078.

What was broken

SemanticMCPToolFilter._get_tools_by_names matched the canonical name stored in the router (e.g. fc_web_search-firecrawl_scrape) against the incoming tools[].name with plain equality. MCP clients that wrap every canonical name with their own additive namespace prefix — notably opencode (mcp/index.ts registers every tool as <client_alias>_<canonical>) — never matched, the filter returned [], the proxy forwarded tools: [] with tool_choice: \"auto\", and strict OpenAI-compatible upstreams rejected the request:

litellm.BadRequestError: Hosted_vllmException -
{\"error\":{\"message\":\"__all__: Invalid value for 'tool_choice': 'tool_choice' is only allowed when 'tools' are specified.\"}}

This is distinct from:

  • #22445 (registry-vs-expanded prefix mismatch on proxy side)
  • #24984 (non-MCP tool scope)

Here the mismatch is a client-side prefix layered on top of the already-prefixed canonical name.

Fix

Anchored suffix matching in _get_tools_by_names with a separator-character check.

A new static helper _name_matches_canonical(client_name, canonical) returns True when:

  1. client_name == canonical, or
  2. client_name.endswith(canonical) and the character immediately preceding the canonical suffix is _ or -.

The separator check is what keeps the match semantically correct — rain_gear is not treated as a namespaced version of canonical ear.

_get_tools_by_names now:

  • Builds an index of incoming tools by their client-facing name.
  • For each canonical name returned by the router, tries an exact lookup first and falls back to suffix matching.
  • Deduplicates: each incoming tool is returned at most once even if two canonical names happen to suffix-match the same incoming tool.
  • Passes the original tool object through unchanged, so the client-facing name is preserved and tool-call round-trips continue to work on the client side (matching the expected behavior called out in the issue).

Both _ and - are accepted as separators regardless of the proxy's MCP_TOOL_PREFIX_SEPARATOR setting, because the client has no way to know what the proxy is configured with.

Tests

New TestGetToolsByNames class in tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py:

Test Scenario
test_exact_match_unchanged Historical path still works
test_client_prefix_with_underscore_separator The exact opencode scenario from #26078
test_client_prefix_with_dash_separator Clients that use dash as alias separator
test_suffix_without_separator_does_not_match rain_gear does not match canonical ear
test_exact_match_preferred_over_prefixed Exact match wins over suffix match for ordering stability
test_same_tool_not_returned_twice Two canonicals suffix-matching the same tool → one result, not two
test_ordering_follows_router_output Returned tools follow the semantic router's order

pytest tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py14 passed (7 new + 7 pre-existing, none regressed).

Scope

Kept strictly to the matching bug called out in the issue. The hook's behavior when filter_tools legitimately returns [] (router finds nothing relevant) is unchanged and can be addressed separately if maintainers want a broader safeguard against forwarding tools: [] with tool_choice: \"auto\".

a client-side namespace prefix.

`SemanticMCPToolFilter._get_tools_by_names` matched by exact equality
between the canonical name stored in the router
(`<server><MCP_TOOL_PREFIX_SEPARATOR><tool>`) and the name in the
incoming `tools[]` list. MCP clients such as opencode wrap every tool
name with their own additive alias prefix
(`<client_alias>_<canonical>`), so the two never matched, the filter
dropped every tool to zero, and the proxy forwarded `tools: []` with
`tool_choice: auto` — which strict upstream providers reject with a 400.

The fix adds anchored suffix matching with a separator check: the
canonical must form the complete tail of the incoming name and be
preceded by `_` or `-`. Exact matches still win over suffix matches,
incoming tools are returned at most once, and the original tool object
is passed through unchanged so the client-facing name survives for
tool-call round-trips.

Seven unit tests in a new TestGetToolsByNames class cover exact
match, underscore- and dash-prefixed variants, non-separator-anchored
suffixes (which must not match), exact-wins-over-prefixed precedence,
deduplication when two canonicals suffix-match the same incoming tool,
and ordering-follows-router-output.

Fixes BerriAI#26078
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps

greptile-apps Bot commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes SemanticMCPToolFilter._get_tools_by_names to handle MCP clients (e.g. opencode) that prepend their own namespace prefix to every canonical tool name before sending it back in tools[]. The fix adds a _name_matches_canonical helper that accepts a tool when the client name is an exact match or a prefix-separated suffix of the canonical, and refactors _get_tools_by_names to use exact-lookup-first with a suffix-scan fallback, deduplication via object identity, and preservation of the original client-facing name.

Confidence Score: 5/5

Safe to merge — the fix is narrowly scoped to name-matching logic, all remaining findings are P2 style suggestions, and 14 tests pass with no regressions.

The implementation is correct: the separator check prevents false positives, deduplication via object identity is safe because available_by_name holds live references, and exact-match-first ensures backward compatibility. No P0/P1 issues found.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py Adds _name_matches_canonical static helper and rewrites _get_tools_by_names with exact-then-suffix matching, deduplication via id(), and preserved client-facing names. Logic is correct and well-documented.
tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py Adds 7 targeted unit tests covering exact match, underscore/dash prefix, non-separator suffix rejection, exact-wins-over-prefix, dedup, and ordering. All tests are pure mock tests with no real network calls. Missing coverage for two distinct canonicals matching the same tool.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_get_tools_by_names(tool_names, available_tools)"] --> B["Build available_by_name index\n(client_name → tool object)"]
    B --> C["For each canonical in tool_names"]
    C --> D{"Exact lookup\navailable_by_name.get(canonical)"}
    D -- "Found" --> G
    D -- "Not found" --> E["Iterate available_by_name\ncall _name_matches_canonical(client_name, canonical)"]
    E --> F{"Suffix + separator\nmatch found?"}
    F -- "Yes" --> G{"id(tool) already\nin used_ids?"}
    F -- "No" --> H["Skip canonical"]
    G -- "No" --> I["Append tool to matched\nAdd id(tool) to used_ids"]
    G -- "Yes (dedup)" --> H
    I --> C
    H --> C
    C -- "All canonicals done" --> J["Return matched (router-ordered,\nclient-facing names preserved)"]
Loading

Reviews (1): Last reviewed commit: "fix(mcp_semantic_tool_filter): match can..." | Re-trigger Greptile

Comment on lines +560 to +574
def test_same_tool_not_returned_twice(self):
"""
Two canonicals that both suffix-match the same incoming tool
must not produce a duplicate in the output list.
"""
filter_instance = self._make_filter()
available_tools = [
{"name": "srv_read_file", "description": "read"}
]

matched = filter_instance._get_tools_by_names(
["read_file", "read_file"], available_tools
)

assert len(matched) == 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dedup test only covers duplicate canonicals, not distinct canonicals

test_same_tool_not_returned_twice exercises the case where the same canonical appears twice in tool_names. The more realistic dedup scenario is two different canonicals that both suffix-match the same incoming tool (e.g. canonicals ["read_file", "file"] against client tool "srv_read_file"). Consider adding a second assertion case to give the id()-based guard full coverage:

def test_same_tool_not_returned_twice_distinct_canonicals(self):
    """Two *different* canonicals that suffix-match the same incoming tool."""
    filter_instance = self._make_filter()
    available_tools = [{"name": "srv_read_file", "description": "read"}]

    matched = filter_instance._get_tools_by_names(
        ["read_file", "file"], available_tools
    )

    assert len(matched) == 1

Comment on lines +267 to +275
matched: List[Any] = []
used_ids: set = set()
for canonical in tool_names:
tool = available_by_name.get(canonical)
if tool is None:
for client_name, candidate in available_by_name.items():
if self._name_matches_canonical(client_name, canonical):
tool = candidate
break

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Linear scan per unmatched canonical is O(n×m) at worst

For each canonical that misses the exact-lookup, the code iterates over all entries in available_by_name until a suffix match is found — and stops at the first hit (break). In typical MCP deployments (dozens to low hundreds of tools) this is fine, but if the registry ever grows to thousands of tools the per-request cost will be noticeable. A pre-built suffix index keyed on canonical → matching client tool could eliminate the scan, but that would complicate the code significantly. Adding a short comment noting the complexity trade-off would help future readers make an informed decision if they revisit this path.

@codspeed-hq

codspeed-hq Bot commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing sakenuGOD:fix/26078-mcp-semantic-filter-client-prefix (300f673) with main (26fcbc9)

Open in CodSpeed

@codecov

codecov Bot commented Apr 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...y/_experimental/mcp_server/semantic_tool_filter.py 92.59% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@sakenuGOD

Copy link
Copy Markdown
Contributor Author

Closing: CI guard requires external PRs to target litellm_oss_branch instead of main. Re-opened against the correct base in #NEW.

@sakenuGOD

Copy link
Copy Markdown
Contributor Author

Re-opened against litellm_oss_branch as required by the guard: #26117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants