Defining the problem space for asynchronous tool execution #843
Replies: 3 comments 19 replies
-
|
Regarding problems (1) and (2), is there something in the MCP protocol that requires tool calls to be synchronous? I think that tool calls using the HTTP transport are inherently async, so I'm not clear what protocol changes are needed for this. |
Beta Was this translation helpful? Give feedback.
-
|
We had a good conversation around this in the working group call today - there are a few issues, and a few things that came up. First, there is bluriness between "sessions" from the mcp server perspective, and "sessions" from the client perspective. Maybe they are the same thing, but if you fire a tool call with the intention of waiting for it to come back, and there's a delay of some sort, how do you correlate the tool call with the client (inference?) session when it finally returns. Maybe this isn't an issue, it's a funny one. Second, there was discussion about transport protocol being intimately connected to sessions, which could be an issue long term. Third, it was brought up that polling should never be a requirement for checking for tool completion - it might be one model, but should never be the only model. That is to say, to really allow this to scale we'll want some form of callback at some point so the client doesn't have to be connected and can get alerted to a tool completion. I'll summarize the high level issue I see - the clients as they exist today will wait for a tool response before moving on with their inference session. So, you really CAN'T just let a multi hour tool call sit there holding onto the inference session. So, there are a lot of different ways to solve this, but I'll throw out one which could work with the current protocol (with maybe a few small change) - a tool call happens - the server starts the call, and if it finishes within 10 seconds just returns a response. If it goes further, the call returns a resource pointer. The client can then poll that resource for the results to the call. If the client is smart enough it could setup a subscription to that resource, and that essentially create a model WITHIN A SESSION to have long lived tool calls. I do not believe this is MxN compatible, because most clients would not immediately know to subscribe to the resource to see when it finishes, so a small change that could be made is something like a "pending" resource that would indicate to the client that it will eventually satisfy and it should subscribe to it. A2A has a good solution to this where a task call can include a web hook call back - that solves the async issue, though its not very MCP-ish and doesn't work great with think desktop clients - I think a better model that fits in with MCP would be alternate notification transport mechanisms. So, an example would be being able to register for a webhook callback for any notification. That callback could include some sort of token to resume an MCP session and maybe some sort of correlation ID supplied by the client, so in that way an agent could fire a call, wait for the callback, correlate the callback with a local inference session, and then pull the remote resource. I think this is actually a nice model (and @evalstate brought up the correlation id stuff, that could be generally useful within elicitation and other interactions) and this builds some abstractions around transports that could be really useful downstream. |
Beta Was this translation helpful? Give feedback.
-
|
@evalstate what do you think about a small pr to allow clients to include a client_session_id that round trips through the server on requests, included in tool returns, elicitation requests, etc |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Motivation
There is a growing consensus within the community that the current request/response model for
tools/executepresents significant challenges for real-world applications. This theme is evident across several ongoing discussions and proposals (e.g., #617, #651, #650, #543)The goal of this new discussion is to consolidate these concerns and formally define the problem space we need to solve.
Problem Statement
The fundamental issue is that many critical tool interactions do not fit the simple, blocking request/response paradigm. This is especially true for tools that trigger jobs in cloud environments or interact with complex enterprise systems. We can break the problem down into the following key areas:
1. Unresponsive Clients and Degraded User Experience
Interactive clients (IDEs, desktop apps) must remain responsive at all times. A
tools/executecall that blocks the client while waiting for a response is not viable. For any operation that takes more than a few hundred milliseconds, the user interface will "freeze". This is a critical usability failure.sequenceDiagram participant User participant IDE_Client participant MCP_Server User->>IDE_Client: "Analyze workspace for tech debt" IDE_Client->>MCP_Server: tools/execute (long-running task) Note over IDE_Client: Chat thread is blocked. User-->>IDE_Client: Tries to send a new chat (fails) MCP_Server-->>IDE_Client: Response (after several minutes) Note over IDE_Client: Chat thread is unblocked.2. Inefficient Agentic Systems
Advanced AI agents often need to orchestrate multiple tools to achieve a goal. A blocking execution model forces these tool calls into a slow, sequential chain. An agent's total execution time becomes the sum of all individual tool latencies, crippling its ability to perform complex, multi-source tasks in a timely manner.
sequenceDiagram participant Agent participant Salesforce_Server participant Slack_Server participant GitHub_Server Agent->>Salesforce_Server: tools/execute (createAccount) Salesforce_Server-->>Agent: Response (OK) Agent->>Slack_Server: tools/execute (createChannel) Slack_Server-->>Agent: Response (OK) Agent->>GitHub_Server: tools/execute (grantAccess) GitHub_Server-->>Agent: Response (OK) Note right of Agent: Total time = Latency(SF) + Latency(Slack) + Latency(GitHub)3. Incompatibility with Long-Running and Indefinite Jobs
Many modern workflows involve operations that are inherently durational and cannot be expected to complete quickly, if at all. This is especially true for jobs offloaded to cloud services.
sequenceDiagram participant Client participant MCP_Server participant Cloud_Service Client->>MCP_Server: tools/execute (transcribe video) MCP_Server->>Cloud_Service: Start transcription job Note over Client, MCP_Server: Client holds connection open, waiting... Client--xMCP_Server: Connection times out after 60s Note over Client: Request failed. User is confused. Note over Cloud_Service: Job continues running in the background, but result can never be delivered.4. Lack of State and Certainty After Disconnection
In a distributed system, network connections are unreliable. If a client disconnects after sending a
tools/executerequest, it has no way of knowing the state of that operation. This is especially problematic for non-idempotent actions.sequenceDiagram participant Mobile_Client participant MCP_Server Mobile_Client->>MCP_Server: tools/execute (provisionServer) Note over Mobile_Client: Connection drops Mobile_Client--xMCP_Server: Disconnected MCP_Server-->>MCP_Server: Processes request, provisions server Note over Mobile_Client: State is unknown. Did it work? Should I retry? Note over MCP_Server: Server has no way to report completion to the disconnected client.This discussion aims to gather community input to ensure we have a comprehensive definition of these problems before we move toward architecting a solution.
Beta Was this translation helpful? Give feedback.
All reactions