-
-
Notifications
You must be signed in to change notification settings - Fork 79.2k
Process supervisor: graceful signal escalation and drain timeout for exec tool #66399
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.Can lose, corrupt, or silently drop user/session/config data.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.Can lose, corrupt, or silently drop user/session/config data.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Problem
When the
exectool times out (eitheroverall-timeoutorno-output-timeout), the process supervisor sends an immediateSIGKILL(supervisor.ts#L163):This means:
adapter.wait()relies solely on thecloseevent with no independent drain timeout. The 4-secondFORCE_KILL_WAIT_FALLBACK_MSfallback only activates on Windows.Observed impact
Subagent processes (Claude Code, coding agents) that time out lose any in-progress state. Processes that spawn their own children and set up signal handlers for graceful shutdown never get the chance to use them.
Proposed fix: two-phase signal escalation + drain timeout
Phase 1 — Graceful shutdown (SIGTERM + grace period)
Phase 2 — Independent drain timeout
After SIGKILL, add a POSIX drain timeout (not just Windows) to prevent
wait()from hanging indefinitely:The existing
FORCE_KILL_WAIT_FALLBACK_MS(4000ms, Windows-only) could be unified into a cross-platform drain timeout.Optional: pipe close watchdog
When cancellation is requested, explicitly close stdout/stderr pipes to unblock any blocked readers before sending the kill signal:
Prior art
exec.Cmd.WaitDelay: Go 1.20+ addedWaitDelayas a built-in mechanism for exactly this pattern — close pipes after process exit, force-kill after delayScope
src/process/supervisor/supervisor.ts— signal escalation incancelAdaptersrc/process/supervisor/adapters/child.ts— cross-platform drain timeout (extendFORCE_KILL_WAIT_FALLBACK_MSto POSIX)