fix(coding-agent): abort in-flight LLM call on AgentSession.dispose()#5029
Closed
TerminallyChilI wants to merge 1 commit into
Closed
fix(coding-agent): abort in-flight LLM call on AgentSession.dispose()#5029TerminallyChilI wants to merge 1 commit into
TerminallyChilI wants to merge 1 commit into
Conversation
AgentSession.dispose() removes the host's event listener via
_disconnectFromAgent() but does not abort any in-flight LLM HTTP
request. Callers that dispose a session mid-stream (every
switchSession / newSession / fork / clone via teardownCurrent)
leave the previous LLM call running in the background:
* Orphaned TCP socket to the LLM provider — stays ESTABLISHED
until the provider responds, often minutes for slow-stream APIs.
* Provider billing increments for the orphan call whose output
is discarded.
* External observers see the process at 0% CPU with one
ESTABLISHED HTTPS socket and no events flowing,
indistinguishable from a wedged process.
Adds a synchronous this.agent.abort() at the top of dispose(),
wrapped in try/catch so dispose remains exception-safe.
AbortController.abort() is synchronous: the signal trips
immediately even though the fetch rejection lands later. Keeps
dispose() synchronous so existing callers don't need to be updated.
Discovered by PIRA (PI Remote Access) when tab-reopen triggered
switchSession against an already-loaded session, surfacing as a
'pi appears to hang for 15-20 minutes after browser tab reopen'
bug class. PIRA shipped a host-side workaround (skip the
redundant switch_session RPC when the bridge already has the
session loaded) but the underlying dispose-without-abort
behaviour bites any host that legitimately tears down a session
mid-stream.
Tests:
- agent-session-dispose-aborts.test.ts (4 cases): signal trips
on dispose mid-stream, agent.abort called exactly once,
exception-safe when agent.abort throws, idempotent on
re-dispose.
Open question for review: dispose() only aborts the main agent.
The session also has 4 other AbortControllers (compaction,
autoCompaction, branchSummary, retry, bash) that aren't tripped
here. Recommend a follow-up PR (or expanding this one) to abort
those too in dispose() — same orphan class, same fix shape.
Collaborator
|
Closing because the PR author is not listed in .github/APPROVED_CONTRIBUTORS. |
Collaborator
|
Implemented the fix locally with the narrower lifecycle change discussed here: Covered controllers:
Kept Validation: This comment is AI-generated by |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AgentSession.dispose()removes the host's event listener via_disconnectFromAgent()but does not abort any in-flight LLM HTTP request. Callers that dispose a session mid-stream (everyswitchSession/newSession/fork/cloneviateardownCurrent) leave the previous LLM call running in the background:Adds a synchronous
this.agent.abort()at the top ofdispose(), wrapped intry/catchso dispose remains exception-safe.AbortController.abort()is synchronous: the signal trips immediately even though the fetch rejection lands later. Keepsdispose()synchronous so existing callers don't need to be updated.Discovered by PIRA when a session change against an already-loaded session triggered the orphan path, surfacing as 15-20 minute apparent hangs after any host-driven session change while a stream was in flight. PIRA shipped a host-side workaround (skip the redundant
switch_sessionRPC when the bridge already has the session loaded) but the underlying dispose-without-abort behaviour bites any host that legitimately tears down a session mid-stream.Recent in-the-wild repro: captured today (2026-05-26) — bridge sat wedged for 25 min after
switch_sessionmid-turn. Trace taggedswitch_session-teardown+subscriberless+llm-http-hangby our analyzer. Confirms the failure mode is legitimate.Tests:
agent-session-dispose-aborts.test.ts(4 cases): signal trips on dispose mid-stream,agent.abortcalled exactly once, exception-safe whenagent.abortthrows, idempotent on re-dispose.Open question for review:
dispose()only aborts the main agent. The session has several other AbortControllers (compaction, autoCompaction, branchSummary, retry, bash) that aren't tripped here. Recommend a follow-up PR (or expanding this one) to abort those too indispose(), same orphan class, same fix pattern.