-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
subagent timeout leaves zombie claude -p → late output emitted directly to user transport (bypasses parent agent) #76962
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Environment
Bug chain (3 linked issues)
Bug A — Subagent timeout does not kill the physical
claude -pprocessWhen a subagent session expires (
timeoutSeconds), the parent receivesstatus: timed out. However, the underlyingclaude -pprocess spawnedby the subagent remains alive as a zombie. It can continue executing tool
calls minutes after the parent declared it timed out.
Observed:
claude -pchild of gateway (viaps --ppid <gateway_pid>)alive for 297–306s after
timeoutSeconds: 120/180expired.Bug C — Zombie subagent output emitted raw to user transport
When the zombie process completes its pending tool calls after the parent
has already emitted
final_answer, the runtime attempts anAutomatic session resume. When that resume fails (e.g., claude-clierrors), it emits the raw subagent output directly to the user transport
with the literal prefix:
This bypasses the parent agent entirely. The user receives raw internal
output that should never reach them.
Bug D — Cross-model fallback amplifies the leak
After the claude-cli resume fails, the runtime spawns a new session using
the fallback model (gpt-5.4/Codex) to "explain" the previous error. This
session also emits its
final_answerto the transport, resulting inadditional unsolicited messages to the user.
Reproduction steps
timeoutSeconds: N(tested: 120, 180).execwith a command that hitsexec.approval.waitDecision(queue wait > N seconds).status: timed outand emits its ownfinal_answer.claude -pcompletes, runtime attempts session resume.Expected behavior
timeoutSecondsexpires: kill the physicalclaude -pprocess (SIGTERM → SIGKILL).final_answer: discard any late outputfrom zombie subagents, do not attempt session resume.
exec.approvalqueue items when the owning subagent has timed out.Workaround (in place)
Restricting the subagent's exec allowlist to a small set of safe binaries
prevents the approval queue from blocking, which eliminates the reproduction
path. This does not fix the underlying zombie/resume issue.
Severity
High — zombie output reaches the user transport, bypassing the parent agent's
output control. In multi-agent architectures with strict output routing, this
breaks the invariant that only the parent speaks to the user.