Skip to content

[responses API] Add list_tools_for_servers and threading server_keys in routers#16540

Merged
slin1237 merged 2 commits intomainfrom
chang/resp-refactor-5
Jan 6, 2026
Merged

[responses API] Add list_tools_for_servers and threading server_keys in routers#16540
slin1237 merged 2 commits intomainfrom
chang/resp-refactor-5

Conversation

@CatherineSue
Copy link
Copy Markdown
Collaborator

@CatherineSue CatherineSue commented Jan 6, 2026

Motivation

This PR fixes mcp tool leakage issue in grpc and openai routers.

When multiple requests use different MCP servers, there was a tool leakage issue:

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           BEFORE (Broken)                                   │
  ├─────────────────────────────────────────────────────────────────────────────┤
  │                                                                             │
  │  Request A                              Request B                           │
  │  tools: [{ server_url: "server-a" }]   tools: [{ server_url: "server-b" }]  │
  │           │                                      │                          │
  │           ▼                                      ▼                          │
  │     Connect to server-a                   Connect to server-b               │
  │           │                                      │                          │
  │           └──────────────┬───────────────────────┘                          │
  │                          ▼                                                  │
  │              ┌─────────────────────┐                                        │
  │              │   MCP Inventory     │                                        │
  │              │   (Global Pool)     │                                        │
  │              │                     │                                        │
  │              │  - static tools     │                                        │
  │              │  - server-a tools   │  ← Added by Request A                  │
  │              │  - server-b tools   │  ← Added by Request B                  │
  │              └──────────┬──────────┘                                        │
  │                         │                                                   │
  │                         ▼                                                   │
  │              list_tools() returns ALL                                       │
  │                         │                                                   │
  │           ┌─────────────┴─────────────┐                                     │
  │           ▼                           ▼                                     │
  │    Request A sees:              Request B sees:                             │
  │    - static tools               - static tools                              │
  │    - server-a tools             - server-a tools  ← WRONG! Leaked           │
  │    - server-b tools ← WRONG!    - server-b tools                            │
  │                                                                             │
  └─────────────────────────────────────────────────────────────────────────────┘

Modifications

  • Add list_tools_for_servers in McpManager that filters tools to static servers + specified dynamic servers
  • Add is_static_server_by_key to identify static servers (always visible)
  • Update ensure_mcp_connection to return server keys for a request
  • Added requested_servers: Arc<StdRwLock<Vec<String>>> in gRPC contexts to store per-request server keys

Threading Approaches

The commit uses two different approaches to pass server_keys:

Router Approach
gRPC (Harmony/Regular) Store in context: ctx.requested_servers
OpenAI Pass through params: McpLoopConfig.server_keys

Solution in Flow Chart

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           AFTER (Fixed)                                     │
  ├─────────────────────────────────────────────────────────────────────────────┤
  │                                                                             │
  │  Request A                              Request B                           │
  │  tools: [{ server_url: "server-a" }]   tools: [{ server_url: "server-b" }]  │
  │           │                                      │                          │
  │           ▼                                      ▼                          │
  │  ensure_mcp_connection()               ensure_mcp_connection()              │
  │  returns: (true, ["server-a"])         returns: (true, ["server-b"])        │
  │           │                                      │                          │
  │           ▼                                      ▼                          │
  │  ctx.requested_servers = ["server-a"]  ctx.requested_servers = ["server-b"] │
  │           │                                      │                          │
  │           ▼                                      ▼                          │
  │  list_tools_for_servers(["server-a"])  list_tools_for_servers(["server-b"]) │
  │           │                                      │                          │
  │           ▼                                      ▼                          │
  │    Request A sees:                       Request B sees:                    │
  │    - static tools ✓                      - static tools ✓                   │
  │    - server-a tools ✓                    - server-b tools ✓                 │
  │    - server-b tools ✗ (filtered out)     - server-a tools ✗ (filtered out)  │
  │                                                                             │
  └─────────────────────────────────────────────────────────────────────────────┘

Accuracy Tests

The test sends two requests to the same grpc worker (regular). The first request contains a dynamic server A, the second request contains a dynamic server B. The two responses shows the mcp tools from their scope.

No static server is set up for this test.

  • 1st Request:

Mcp: brave search server:

curl http://localhost:3002/v1/responses \
-H "Content-Type: application/json" \
-d '{
  "model": "/models/meta-llama/Llama-3.1-8B-Instruct",
  "tools": [
    {
      "type": "mcp",
      "server_label": "brave",
      "server_description": "A Tool to do web search",
      "server_url": "http://localhost:8001/sse",
      "require_approval": "never"
    }
  ],
  "input": "could you show me one simple news about sglang",
  "reasoning": {"effort": "low"}
}' | jq

Response:

mcp_list_tools shows tools from brave_web_search

Screenshot 2026-01-05 at 10 25 02 PM
Click to view the detailed response
{
  "id": "resp_df551e47-12a1-408d-92da-6b0ca17a3597",
  "object": "response",
  "created_at": 1767680604,
  "status": "completed",
  "model": "/models/meta-llama/Llama-3.1-8B-Instruct",
  "output": [
    {
      "type": "mcp_list_tools",
      "id": "mcpl_8a4e6860-d111-4a1d-b7a2-1fe2b232053f",
      "server_label": "brave",
      "tools": [
        {
          "name": "brave_web_search",
          "description": "Performs a web search using the Brave Search API, ideal for general queries, news, articles, and online content. Use this for broad information gathering, recent events, or when you need diverse web sources. Supports pagination, content filtering, and freshness controls. Maximum 20 results per request, with offset for pagination. ",
          "input_schema": {
            "type": "object",
            "properties": {
              "query": {
                "type": "string",
                "description": "Search query (max 400 chars, 50 words)"
              },
              "count": {
                "type": "number",
                "description": "Number of results (1-20, default 10)",
                "default": 10
              },
              "offset": {
                "type": "number",
                "description": "Pagination offset (max 9, default 0)",
                "default": 0
              }
            },
            "required": [
              "query"
            ]
          },
          "annotations": {
            "read_only": false
          }
        },
        {
          "name": "brave_local_search",
          "description": "Searches for local businesses and places using Brave's Local Search API. Best for queries related to physical locations, businesses, restaurants, services, etc. Returns detailed information including:\n- Business names and addresses\n- Ratings and review counts\n- Phone numbers and opening hours\nUse this when the query implies 'near me' or mentions specific locations. Automatically falls back to web search if no local results are found.",
          "input_schema": {
            "type": "object",
            "properties": {
              "query": {
                "type": "string",
                "description": "Local search query (e.g. 'pizza near Central Park')"
              },
              "count": {
                "type": "number",
                "description": "Number of results (1-20, default 5)",
                "default": 5
              }
            },
            "required": [
              "query"
            ]
          },
          "annotations": {
            "read_only": false
          }
        }
      ]
    },
    {
      "type": "message",
      "id": "msg_chatcmpl-3364698a-c892-4e50-a800-4893de36311b",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The function \"brave_web_search\" with the query \"sglang\" returns a piece of news about Sglang, a high-performance serving framework for large language models and multimodal models."
        }
      ],
      "status": "completed"
    },
    {
      "type": "mcp_call",
      "id": "mcp_799fead2-8d51-49b0-8cc6-2a91a9ca8210",
      "status": "completed",
      "arguments": "{\"query\":\"sglang\",\"count\":1,\"offset\":0}",
      "name": "brave_web_search",
      "output": "{\"content\":[{\"type\":\"text\",\"text\":\"Title: GitHub - sgl-project/sglang: SGLang is a high-performance serving framework for large language models and multimodal models.\\nDescription: SGLang is <strong>a high-performance serving framework for large language models and multimodal models</strong>. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.\\nURL: https://github.com/sgl-project/sglang\"}],\"isError\":false}",
      "server_label": "brave"
    }
  ],
  "parallel_tool_calls": true,
  "store": true,
  "temperature": 1,
  "tool_choice": "\"auto\"",
  "tools": [
    {
      "type": "mcp",
      "server_url": "http://localhost:8001/sse",
      "server_label": "brave",
      "server_description": "A Tool to do web search",
      "require_approval": "never"
    }
  ],
  "top_p": 1,
  "usage": {
    "prompt_tokens": 707,
    "completion_tokens": 40,
    "total_tokens": 747
  },
  "metadata": {}
}
  • 2nd Request:

Mcp: deepwiki

curl http://localhost:3002/v1/responses \
-H "Content-Type: application/json" \
-d '{
  "model": "/models/meta-llama/Llama-3.1-8B-Instruct",
  "tools": [
    {
        "type": "mcp",
        "server_label": "deepwiki",
        "server_url": "https://mcp.deepwiki.com/mcp",
        "require_approval": "never"
      }
  ],
  "input": "could you show me one simple news about sglang",
  "reasoning": {"effort": "low"}
}' | jq

Response:

mcp_list_tools shows only deepwiki tools

Screenshot 2026-01-05 at 10 25 34 PM
Click to view the detailed response
{
  "id": "resp_66ca2fda-f4eb-49b7-8640-f03be3058a91",
  "object": "response",
  "created_at": 1767680707,
  "status": "completed",
  "model": "/models/meta-llama/Llama-3.1-8B-Instruct",
  "output": [
    {
      "type": "mcp_list_tools",
      "id": "mcpl_a00ebe03-5d3f-4f72-8211-7f695b4e7e6c",
      "server_label": "deepwiki",
      "tools": [
        {
          "name": "read_wiki_contents",
          "description": "View documentation about a GitHub repository",
          "input_schema": {
            "type": "object",
            "properties": {
              "repoName": {
                "type": "string",
                "description": "GitHub repository: owner/repo (e.g. \"facebook/react\")"
              }
            },
            "required": [
              "repoName"
            ],
            "additionalProperties": false,
            "$schema": "http://json-schema.org/draft-07/schema#"
          },
          "annotations": {
            "read_only": false
          }
        },
        {
          "name": "ask_question",
          "description": "Ask any question about a GitHub repository",
          "input_schema": {
            "type": "object",
            "properties": {
              "repoName": {
                "type": "string",
                "description": "GitHub repository: owner/repo (e.g. \"facebook/react\")"
              },
              "question": {
                "type": "string",
                "description": "The question to ask about the repository"
              }
            },
            "required": [
              "repoName",
              "question"
            ],
            "additionalProperties": false,
            "$schema": "http://json-schema.org/draft-07/schema#"
          },
          "annotations": {
            "read_only": false
          }
        },
        {
          "name": "read_wiki_structure",
          "description": "Get a list of documentation topics for a GitHub repository",
          "input_schema": {
            "type": "object",
            "properties": {
              "repoName": {
                "type": "string",
                "description": "GitHub repository: owner/repo (e.g. \"facebook/react\")"
              }
            },
            "required": [
              "repoName"
            ],
            "additionalProperties": false,
            "$schema": "http://json-schema.org/draft-07/schema#"
          },
          "annotations": {
            "read_only": false
          }
        }
      ]
    },
    {
      "type": "message",
      "id": "msg_chatcmpl-27732c9e-7f2e-45f1-8fba-832462c7563d",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The function `ask_question` is called with the repository `facebook/sglang` and the question \"What is one simple news about sglang?\". However, since `facebook/sglang` is not a valid repository, the function returns an error message. To get the news, you need to index the repository first by visiting the provided link."
        }
      ],
      "status": "completed"
    },
    {
      "type": "mcp_call",
      "id": "mcp_66b0deee-95f4-4397-9bdd-21221c386660",
      "status": "completed",
      "arguments": "{\"question\":\"What is one simple news about sglang?\",\"repoName\":\"facebook/sglang\"}",
      "name": "ask_question",
      "output": "{\"content\":[{\"type\":\"text\",\"text\":\"Error processing question: Repository not found. Visit https://deepwiki.com/facebook/sglang to index it.\"}]}",
      "server_label": "deepwiki"
    }
  ],
  "parallel_tool_calls": true,
  "store": true,
  "temperature": 1,
  "tool_choice": "\"auto\"",
  "tools": [
    {
      "type": "mcp",
      "server_url": "https://mcp.deepwiki.com/mcp",
      "server_label": "deepwiki",
      "require_approval": "never"
    }
  ],
  "top_p": 1,
  "usage": {
    "prompt_tokens": 602,
    "completion_tokens": 72,
    "total_tokens": 674
  },
  "metadata": {}
}

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@CatherineSue CatherineSue changed the title Add list_tools_for_servers and threading server_keys in routers [responses API] Add list_tools_for_servers and threading server_keys in routers Jan 6, 2026
@slin1237 slin1237 merged commit 9bf76c1 into main Jan 6, 2026
63 of 64 checks passed
@slin1237 slin1237 deleted the chang/resp-refactor-5 branch January 6, 2026 06:42
jamesjxliu pushed a commit to jamesjxliu/sglang that referenced this pull request Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants