Defining the problem space for asynchronous tool execution #843

Joffref · 2025-06-26T06:29:47Z

Joffref
Jun 26, 2025

Motivation

There is a growing consensus within the community that the current request/response model for tools/execute presents significant challenges for real-world applications. This theme is evident across several ongoing discussions and proposals (e.g., #617, #651, #650, #543)
The goal of this new discussion is to consolidate these concerns and formally define the problem space we need to solve.

Problem Statement

The fundamental issue is that many critical tool interactions do not fit the simple, blocking request/response paradigm. This is especially true for tools that trigger jobs in cloud environments or interact with complex enterprise systems. We can break the problem down into the following key areas:

1. Unresponsive Clients and Degraded User Experience

Interactive clients (IDEs, desktop apps) must remain responsive at all times. A tools/execute call that blocks the client while waiting for a response is not viable. For any operation that takes more than a few hundred milliseconds, the user interface will "freeze". This is a critical usability failure.

Concrete Example: A developer using an AI-integrated IDE asks it to "analyze the entire workspace for tech debt." The MCP server begins a process that may take several minutes. In a blocking model, the entire IDE chat freezes until the analysis is complete, preventing the developer from performing any other action.

sequenceDiagram
    participant User
    participant IDE_Client
    participant MCP_Server

    User->>IDE_Client: "Analyze workspace for tech debt"
    IDE_Client->>MCP_Server: tools/execute (long-running task)
    Note over IDE_Client: Chat thread is blocked.
    User-->>IDE_Client: Tries to send a new chat (fails)
    MCP_Server-->>IDE_Client: Response (after several minutes)
    Note over IDE_Client: Chat thread is unblocked.

2. Inefficient Agentic Systems

Advanced AI agents often need to orchestrate multiple tools to achieve a goal. A blocking execution model forces these tool calls into a slow, sequential chain. An agent's total execution time becomes the sum of all individual tool latencies, crippling its ability to perform complex, multi-source tasks in a timely manner.

Concrete Example: An agent is tasked with "Onboard new customer 'XYZ Corp'." This requires three independent API calls via MCP: create an account in Salesforce, create a private channel in Slack, and grant access to a GitHub repository. A blocking agent must wait for each step to complete before starting the next, making the process unnecessarily slow.

sequenceDiagram
    participant Agent
    participant Salesforce_Server
    participant Slack_Server
    participant GitHub_Server

    Agent->>Salesforce_Server: tools/execute (createAccount)
    Salesforce_Server-->>Agent: Response (OK)
    Agent->>Slack_Server: tools/execute (createChannel)
    Slack_Server-->>Agent: Response (OK)
    Agent->>GitHub_Server: tools/execute (grantAccess)
    GitHub_Server-->>Agent: Response (OK)
    Note right of Agent: Total time = Latency(SF) + Latency(Slack) + Latency(GitHub)

3. Incompatibility with Long-Running and Indefinite Jobs

Many modern workflows involve operations that are inherently durational and cannot be expected to complete quickly, if at all. This is especially true for jobs offloaded to cloud services.

Concrete Example: A user asks an AI assistant to "transcribe the audio from this two-hour meeting video." The MCP server offloads this job to a cloud transcription service. A simple request/response model is unworkable here, as the client connection will time out long before the job is finished. The client has no way to know when the result is ready.

sequenceDiagram
    participant Client
    participant MCP_Server
    participant Cloud_Service

    Client->>MCP_Server: tools/execute (transcribe video)
    MCP_Server->>Cloud_Service: Start transcription job
    Note over Client, MCP_Server: Client holds connection open, waiting...
    Client--xMCP_Server: Connection times out after 60s
    Note over Client: Request failed. User is confused.
    Note over Cloud_Service: Job continues running in the background, but result can never be delivered.

4. Lack of State and Certainty After Disconnection

In a distributed system, network connections are unreliable. If a client disconnects after sending a tools/execute request, it has no way of knowing the state of that operation. This is especially problematic for non-idempotent actions.

Concrete Example: A user on a mobile device with a spotty connection asks an AI to "provision a new $50/month server on our cloud provider." The request is sent, but the mobile client loses its connection. The user has no idea if the server was created. If they retry, will they be billed for two servers? This ambiguity is unacceptable for critical operations.

sequenceDiagram
    participant Mobile_Client
    participant MCP_Server

    Mobile_Client->>MCP_Server: tools/execute (provisionServer)
    Note over Mobile_Client: Connection drops
    Mobile_Client--xMCP_Server: Disconnected
    MCP_Server-->>MCP_Server: Processes request, provisions server
    Note over Mobile_Client: State is unknown. Did it work? Should I retry?
    Note over MCP_Server: Server has no way to report completion to the disconnected client.

This discussion aims to gather community input to ensure we have a comprehensive definition of these problems before we move toward architecting a solution.

mikekistler · 2025-06-26T16:20:47Z

mikekistler
Jun 26, 2025
Collaborator

Regarding problems (1) and (2), is there something in the MCP protocol that requires tool calls to be synchronous? I think that tool calls using the HTTP transport are inherently async, so I'm not clear what protocol changes are needed for this.

3 replies

patwhite Jun 26, 2025

Ya, we actually spoke about that on the call working group call today, NO, that's not a requirement, so for item 2 in particular that's not an issue. For point 1, it's a bit more like, if you have a multi hour call, the clients generally will sit there waiting for it to come back. So I think point 2 is not really an issue, but point 1 continues to be

Joffref Jun 26, 2025
Author

Yes, I think we can drop point 2 once we have more documentation and a proper way to handle this on the client side. We might also need to update the SDK, since I don't feel it's properly handle today

jssmith Jun 26, 2025

Regarding (1), consider adding a tool annotation that declares the maximum expected duration. This information also could go into the description, but implementations can benefit structured access at the protocol level.

patwhite · 2025-06-26T17:17:47Z

patwhite
Jun 26, 2025

We had a good conversation around this in the working group call today - there are a few issues, and a few things that came up.

First, there is bluriness between "sessions" from the mcp server perspective, and "sessions" from the client perspective. Maybe they are the same thing, but if you fire a tool call with the intention of waiting for it to come back, and there's a delay of some sort, how do you correlate the tool call with the client (inference?) session when it finally returns. Maybe this isn't an issue, it's a funny one.

Second, there was discussion about transport protocol being intimately connected to sessions, which could be an issue long term.

Third, it was brought up that polling should never be a requirement for checking for tool completion - it might be one model, but should never be the only model. That is to say, to really allow this to scale we'll want some form of callback at some point so the client doesn't have to be connected and can get alerted to a tool completion.

I'll summarize the high level issue I see - the clients as they exist today will wait for a tool response before moving on with their inference session. So, you really CAN'T just let a multi hour tool call sit there holding onto the inference session.

So, there are a lot of different ways to solve this, but I'll throw out one which could work with the current protocol (with maybe a few small change) - a tool call happens - the server starts the call, and if it finishes within 10 seconds just returns a response. If it goes further, the call returns a resource pointer. The client can then poll that resource for the results to the call. If the client is smart enough it could setup a subscription to that resource, and that essentially create a model WITHIN A SESSION to have long lived tool calls. I do not believe this is MxN compatible, because most clients would not immediately know to subscribe to the resource to see when it finishes, so a small change that could be made is something like a "pending" resource that would indicate to the client that it will eventually satisfy and it should subscribe to it.

A2A has a good solution to this where a task call can include a web hook call back - that solves the async issue, though its not very MCP-ish and doesn't work great with think desktop clients - I think a better model that fits in with MCP would be alternate notification transport mechanisms. So, an example would be being able to register for a webhook callback for any notification. That callback could include some sort of token to resume an MCP session and maybe some sort of correlation ID supplied by the client, so in that way an agent could fire a call, wait for the callback, correlate the callback with a local inference session, and then pull the remote resource. I think this is actually a nice model (and @evalstate brought up the correlation id stuff, that could be generally useful within elicitation and other interactions) and this builds some abstractions around transports that could be really useful downstream.

13 replies

patwhite Jun 26, 2025

it's the last part, btw, that is why I'm a bit drawn to it - I hate how everything ends up getting a new set of messages (think elicitation), so being able to piggy back on an existing mechanism and no new messages is a big plus for me (speaking from the backend perspective here)

patwhite Jun 26, 2025

I don't actually love the proposal that's floating out there, because it actually intentionally sidesteps the subscriptions if I'm reading it right? Whereas we really want to do this using all the primitives we already have, that's why I like a resource return that's a pending resource, rather than doing this over the top in metadata like the proposal says

victordibia Jun 26, 2025

On a related note, a few folks have been discussing streaming tool calls with partial results #776.

As @mikekistler and @patwhite mention, tool calls can be async and streaming can improve the UX by updating the client with partial results of actual task progress.
Related to point 3, Current python sdk supports some notion of notification resumption for over the streamablehttp transport.
One argument against notifications in the resource/notification work around approach ([feat] Introduce partial results as part of progress notifications #383, fix: add status field in Resource class and Resource as a return type in CallToolResult #549) is that the can be ignored by the client.
See related comment from @dsp-ant

I am wondering if we want a notification to represent partial results or a general streaming mechansim. In my mind, notifications would imply that they can be ignored on client side and would mean that a full result must be presented at the end, duplicating the data provided. I think that streaming of a results via SSE as returns of an RPC call, instead of a notifications, might be generally more fitting for our model, in particular if we consider streaming audio or other formats. Now the problem I see with not using notifications, but relying on streamed results, is that it might stretch JSON-RPC specification. But I think I am okay with that. Hence I rather see streaming within results and not as notifications, unless i am missing a point here.

bzsurbhi Jun 26, 2025

I've created this doc which has summed up all the approaches that are out there with their pros and cons

@patwhite

So, there are a lot of different ways to solve this, but I'll throw out one which could work with the current protocol (with maybe a few small change) - a tool call happens - the server starts the call, and if it finishes within 10 seconds just returns a response. If it goes further, the call returns a resource pointer. The client can then poll that resource for the results to the call. If the client is smart enough it could setup a subscription to that resource, and that essentially create a model WITHIN A SESSION to have long lived tool calls. I do not believe this is MxN compatible, because most clients would not immediately know to subscribe to the resource to see when it finishes, so a small change that could be made is something like a "pending" resource that would indicate to the client that it will eventually satisfy and it should subscribe to it.

This is option 2 in the doc and what I suggested in my first RFC

patwhite Jun 28, 2025

@bzsurbhi I like your proposal, I think something got lost in the long conversation - the benefit of resources here is that you can either poll or subscribe for updates. So, that completely fits into the current model. Also, discussion about progress and stuff got intermingled, which is orthogonal. Also, someone in your conversation said it wasn't backwards compatible which is completely incorrect - net new fields are absolutely backwards compatible. I think your proposal needs some documentation / clarification on what the resource looks like if it's in a pending state (ie whats the empty state) and documentation that you can subscribe to a pending state resource and that status changes MUST trigger a notification. I think that's the bare minimum set of changes to support async tool results

patwhite · 2025-06-26T20:18:42Z

patwhite
Jun 26, 2025

@evalstate what do you think about a small pr to allow clients to include a client_session_id that round trips through the server on requests, included in tool returns, elicitation requests, etc

3 replies

evalstate Jun 29, 2025
Maintainer

Highlighting another relevant discussion here: #823
Adjacent, not directly related discussion here: #543
Related discussion here:

I think there are a few things to work through before we can get to a solution, and would propose we continue the discussion here and in the CWG Discord to form a proposal). Some immediate questions below: - note a large number of these are choices the Host application developer would need to make, but helpful for framing the solutions/giving guidance in the docs.

What does initialize mean, and what constitutes a "change" that would require a reinitialize of the Context ID - e.g. Content Type, Capabilities, Tool Lists?
Should Hosts maintain a separate physical session per Context ID, or do we expect MCP Servers at the transport layer to support multiple Context Ids?
Should MCP Servers advertise their Context ID retention policies to the Host application (assuming a default of 'None')
Can we make the STDIO/HTTP appear more consistent in the SDK? How would any identifier work alongside the MCP Mcp-Session-Id and the SSE last-event-id.
How does this work with the Sampling includeContext flag? Would we distinguish Sampling context-free Task requests (or retire them)?
What impact does this have on Elicitation (should an elicitation be attached to a specific "Context" - I would argue it should).
How would this work for Agents that spawn sub-Agents. For branched conversations? For future re-generations?
Is this separate to the "Long Running Tool" discussions that are also taking place (I think so, but don't know so).

Whilst I think this is very important, it's also the case that we've got this far (~9 months) without it, and the current default of generally stateless Servers (or stateful, but STDIO local servers like basic-memory) is working OK. But as we see more remote Servers, and now DXT distribution etc. this is going to need attention. I propose continuing this thread and CWG discussions; and in the meantime I'll figure out with the other moderators how much overlap/underlap this has with the other initiatives.

patwhite Jun 30, 2025

Ya, I mean, I think this can be very simple - at the end of the day, session id is assigned by the server, and for security reasons that must always happen that way. There is NECESSARILY an orthogonality between a server assigned ID that has timeouts and the like, and a client supplied ID that is purely a pass through from the server perspective, ie the server never looks at this, never contemplates it, just passes it back through as a form of client supplied correlation id. It's different than more complex discussions about coordinating client and server sessions, and honestly just a lot simpler

patwhite Jun 30, 2025

That way, it could be used to handle single tool correlation, but also longer client side context / session correlation.

I'd write it up something like on the metadata of various server calls (tool calls) and it would stream back through into the meta. And, the description would be something like "The correlation_id MUST be returned in any call that is directly spawned by a client call. If a tool execution results in an elicitation, both the elicitation and the tool call result MUST include the correlation id. If a client subscribes to a resource change and the subscription request call has a correlation_id, it MUST be supplied as part of any notification change event."

Something like that, just keeping this VERY separate from sessions

Defining the problem space for asynchronous tool execution #843

Uh oh!

Motivation

Problem Statement

Replies: 3 comments · 19 replies

Uh oh!

mikekistler Jun 26, 2025 Collaborator

Uh oh!

Uh oh!

Joffref Jun 26, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evalstate Jun 29, 2025 Maintainer

Uh oh!

Uh oh!

Replies: 3 comments 19 replies

mikekistler
Jun 26, 2025
Collaborator

Joffref Jun 26, 2025
Author

evalstate Jun 29, 2025
Maintainer