fix(subagents): wire kill and steer to killSessionRun — actually terminate CLI subprocesses#2461
Merged
alexey-pelykh merged 1 commit intomainfrom Apr 22, 2026
Merged
Conversation
…inate CLI subprocesses (#2344) The subagent `kill` and `steer` actions (exposed via the `subagents` agent tool and `/subagents kill` / `/subagents steer` chat commands) lost their subprocess-termination semantics during the pi-embedded gut (`b27cecc795`, #76/#77). Kill marked the registry entry as terminated and cleared queues but the CLI subprocess kept running until natural completion; steer queued a replacement run but left the existing run to finish first (via `agent.wait` settlement) instead of interrupting it. Fix: call `killSessionRun` from `src/agents/session-run-registry.ts` before the queue-clear / settlement logic at all four affected call sites. The registry is already populated by `ChannelBridge` (register at `channel-bridge.ts:160` in a `try`, unregister at `:349` in a `finally`), and `killSessionRun` dispatches either `abortController.abort()` or `process.kill(pid, "SIGTERM")`. It is idempotent (returns `false` on unknown/already-aborted keys — safe no-op) and cannot throw (internal `try/catch` around `process.kill`). Changes: - `src/agents/tools/subagents-tool.ts`: - Import `killSessionRun` from `../session-run-registry.js`. - `killSubagentRun` (kill path): replace `const aborted = false;` dead var with `const aborted = killSessionRun(childSessionKey);`. The outer `killed` aggregate (`marked > 0 || aborted || cleared...`) now correctly reflects when a kill signal was dispatched. Previously `aborted` was a no-op disjunct; this only ADDS truthy cases, so no previously-true scenario regresses. - Steer path (line ~624): insert `killSessionRun(resolved.entry.childSessionKey);` between the `sessionId` resolution and `clearSessionQueues`, so the existing run is aborted before the `agent.wait` settle-timeout polls. The existing `agent.wait` stays as a 5-second settlement window. - `cascadeKillChildren` propagates automatically — it delegates to `killSubagentRun`. - `src/auto-reply/reply/commands-subagents/action-kill.ts`: import + `killSessionRun(childKey);` inserted between `loadSubagentSessionEntry` and `clearSessionQueues`. - `src/auto-reply/reply/commands-subagents/action-send.ts`: import + `killSessionRun(targetResolution.entry.childSessionKey);` in the `steerRequested` branch between `markSubagentRunForSteerRestart` and `clearSessionQueues`. Key-type correctness: `childSessionKey` (from `SubagentRunRecord`) is the canonical ChannelBridge sessionKey that the registry keys on. `entry.sessionId` (Pi-era UUID) is NOT used for registry lookups — preserved for `clearSessionQueues` and `loadSubagentSessionEntry` only. Out of scope (separate issues per #2344 body): - `abort.ts::stopSubagentsForRequester` carries the same `const aborted = false;` dead-var pattern — tracked under the `/abort` user command + gateway `sessions.reset` sub-issue. - Auto-reply queue mode and `abortEmbeddedPiRun` test-mock cleanup — separate sub-issues. - `pi-embedded.ts` stub barrel deletion — tracked under #2089 cascade sweep. Verification: - `pnpm check` (format + tsgo + lint + project-specific lints) → exit 0 - `pnpm test` (full parallel suite) → **800 files / 7010 passed / 3 skipped** — no regression - Rescan: `git grep "killSessionRun"` returns registry definition + registry tests + the 3 files in this PR; no stale/unwired references - Rescan: `git grep "abortEmbeddedPiRun"` returns only test-mock contexts (`reply.block-streaming.test.ts`, `reply/abort.test.ts`) — production paths have no stray references - Adversarial validation (fresh-context subclaude): CLEAN verdict on 6 AC + 11 adversarial checks — confirms `killSessionRun` is LIVE (abortController/SIGTERM), key-type match correct, no double-kill in `action-kill.ts → stopSubagentsForRequester` path (latter targets grandchildren, not the killed session), order `mark → kill → clear → wait` is race-free (all sync within a tick) Closes #2344 Refs: #2089 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-pelykh
added a commit
that referenced
this pull request
Apr 22, 2026
…ctually terminate CLI subprocesses (#2343) Replace three dead `const aborted = false` / `const ended = true` placeholders left behind by the Pi-embedded gut (`b27cecc795`, #76/#77) with real calls to `killSessionRun` / `waitForSessionRunEnd` from `src/agents/session-run-registry.ts`: - `abort.ts::stopSubagentsForRequester` inner loop — `killSessionRun(childKey)` now flows into the `killed` aggregate, so the loop actually counts subagent runs whose subprocesses were signaled. - `abort.ts::tryFastAbortFromMessage` — `killSessionRun(resolvedTargetKey)` returned in the `aborted` field (previously always `false`). User-facing `/abort` now terminates the running CLI subprocess for non-ACP sessions; ACP cancellation path remains separate. - `sessions.ts::ensureSessionRuntimeCleanup` — replaces the `const ended = true` placeholder with `killSessionRun(canonicalKey)` + `waitForSessionRunEnd` (15s timeout). Returns `UNAVAILABLE` on timeout, restoring the pre-gut contract that `sessions.reset` / `sessions.delete` surface the error when a subprocess refuses to end. Also removes the `if (!params.sessionId) return undefined;` early-exit in `ensureSessionRuntimeCleanup`: `sessionId` is used only to extend the queue-key set; termination operates on `canonicalKey`, which is always present. Callers with no `sessionId` now correctly attempt termination (`waitForSessionRunEnd` short-circuits to `true` when no run is registered, so the wait has no cost when there's nothing to wait for). Polish: extracted `15_000` to `SESSION_RUN_TERMINATION_TIMEOUT_MS`, grouped with the sibling `ACP_RUNTIME_CLEANUP_TIMEOUT_MS` constant. Issue body lists three regression sites; only two apply. The third (`commands-session.ts::applyAbortTarget`) does not exist in the current tree — `git grep applyAbortTarget` returns zero hits. The `/session abort` pathway was consolidated into `tryFastAbortFromMessage` at some earlier point, so no duplicate wiring is needed. AC3 marked N/A. Verification: - `pnpm check` → exit 0 - `pnpm test` (full parallel suite) → 800 files / 7010 passed / 3 skipped — no regression - Rescan: `git grep "const aborted = false"` → zero hits across the whole tree (combined sweep of #2344 subagent kill/steer + #2343 abort/subagent-cascade/fastAbort) - Rescan: `git grep "abortEmbeddedPiRun"` → only test-mock contexts; production has no remaining references (test-mock cleanup tracked under #2089 sweep) - Adversarial validation (fresh-context subclaude): CLEAN on 7 AC + 9 adversarial checks — confirms waitForSessionRunEnd is LIVE (50ms poll tick, max 15s wall-clock), `ensureSessionRuntimeCleanup` callers always pass a valid `canonicalKey`, and no test asserts `aborted === false` on the fast-abort path Closes #2343 Refs: #2089, #2344, #2460, #2461 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
alexey-pelykh
added a commit
that referenced
this pull request
Apr 22, 2026
…ctually terminate CLI subprocesses (#2343) (#2462) Replace three dead `const aborted = false` / `const ended = true` placeholders left behind by the Pi-embedded gut (`b27cecc795`, #76/#77) with real calls to `killSessionRun` / `waitForSessionRunEnd` from `src/agents/session-run-registry.ts`: - `abort.ts::stopSubagentsForRequester` inner loop — `killSessionRun(childKey)` now flows into the `killed` aggregate, so the loop actually counts subagent runs whose subprocesses were signaled. - `abort.ts::tryFastAbortFromMessage` — `killSessionRun(resolvedTargetKey)` returned in the `aborted` field (previously always `false`). User-facing `/abort` now terminates the running CLI subprocess for non-ACP sessions; ACP cancellation path remains separate. - `sessions.ts::ensureSessionRuntimeCleanup` — replaces the `const ended = true` placeholder with `killSessionRun(canonicalKey)` + `waitForSessionRunEnd` (15s timeout). Returns `UNAVAILABLE` on timeout, restoring the pre-gut contract that `sessions.reset` / `sessions.delete` surface the error when a subprocess refuses to end. Also removes the `if (!params.sessionId) return undefined;` early-exit in `ensureSessionRuntimeCleanup`: `sessionId` is used only to extend the queue-key set; termination operates on `canonicalKey`, which is always present. Callers with no `sessionId` now correctly attempt termination (`waitForSessionRunEnd` short-circuits to `true` when no run is registered, so the wait has no cost when there's nothing to wait for). Polish: extracted `15_000` to `SESSION_RUN_TERMINATION_TIMEOUT_MS`, grouped with the sibling `ACP_RUNTIME_CLEANUP_TIMEOUT_MS` constant. Issue body lists three regression sites; only two apply. The third (`commands-session.ts::applyAbortTarget`) does not exist in the current tree — `git grep applyAbortTarget` returns zero hits. The `/session abort` pathway was consolidated into `tryFastAbortFromMessage` at some earlier point, so no duplicate wiring is needed. AC3 marked N/A. Verification: - `pnpm check` → exit 0 - `pnpm test` (full parallel suite) → 800 files / 7010 passed / 3 skipped — no regression - Rescan: `git grep "const aborted = false"` → zero hits across the whole tree (combined sweep of #2344 subagent kill/steer + #2343 abort/subagent-cascade/fastAbort) - Rescan: `git grep "abortEmbeddedPiRun"` → only test-mock contexts; production has no remaining references (test-mock cleanup tracked under #2089 sweep) - Adversarial validation (fresh-context subclaude): CLEAN on 7 AC + 9 adversarial checks — confirms waitForSessionRunEnd is LIVE (50ms poll tick, max 15s wall-clock), `ensureSessionRuntimeCleanup` callers always pass a valid `canonicalKey`, and no test asserts `aborted === false` on the fast-abort path Closes #2343 Refs: #2089, #2344, #2460, #2461 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The subagent
killandsteeractions (exposed via thesubagentsagent tool and/subagents kill//subagents steerchat commands) lost their subprocess-termination semantics during the pi-embedded gut (b27cecc795, #76/#77):killed: truewas returned.agent, then waited viaagent.waitfor the existing run to finish naturally before the replacement started. Pre-gut behavior aborted the existing run first so the replacement started immediately with combined context. Post-gut, "steer" was effectively a followup with a 5-second delay.This PR wires
killSessionRun(from the already-livesrc/agents/session-run-registry.ts, populated byChannelBridge) into all four affected call sites.Changes
src/agents/tools/subagents-tool.tskillSubagentRun, line 252)const aborted = false;dead var withconst aborted = killSessionRun(childSessionKey);src/agents/tools/subagents-tool.tskillSessionRun(resolved.entry.childSessionKey);beforeclearSessionQueues/agent.waitsettlementsrc/auto-reply/reply/commands-subagents/action-kill.ts/subagents killkillSessionRun(childKey);betweenloadSubagentSessionEntryandclearSessionQueuessrc/auto-reply/reply/commands-subagents/action-send.ts/subagents steerbranchkillSessionRun(targetResolution.entry.childSessionKey);betweenmarkSubagentRunForSteerRestartandclearSessionQueuesPlus three imports of
killSessionRun.cascadeKillChildrenpropagates the fix automatically — it delegates tokillSubagentRun, which now terminates the subprocess.Why
killSessionRunis the right replacementsession-run-registry.ts:65-83dispatches eitherabortController.abort()ORprocess.kill(pid, "SIGTERM"). Module attestation is"live".ChannelBridgeregisters atchannel-bridge.ts:160(in atry) and unregisters at:349(in afinally). CLI subprocess runtime registerspidat dispatch time.falseon unknown/already-aborted keys; internaltry/catcharoundprocess.killmeans it never throws.childSessionKey(fromSubagentRunRecord) is the canonical ChannelBridge sessionKey the registry keys on. No conversion needed.entry.sessionId(Pi-era UUID) is preserved forclearSessionQueues/ session-store lookups only.Out of scope (per issue body)
abort.ts::stopSubagentsForRequestercarries the sameconst aborted = false;dead-var pattern — tracked under the/abortuser command + gatewaysessions.resetsub-issue.abortEmbeddedPiRun(reply.block-streaming.test.ts,reply/abort.test.ts) — tracked under audit: review all Pi engine stub replacements for ChannelBridge compatibility #2089 cascade sweep.pi-embedded.tsstub barrel deletion — audit: review all Pi engine stub replacements for ChannelBridge compatibility #2089 cascade sweep.Verification
pnpm check(format + tsgo + lint + lint:tmp + lint:no-remoteclaw-ai) → exit 0pnpm test(full parallel suite) → 800 files / 7010 passed / 3 skipped — no regressiongit grep "killSessionRun"returns registry definition + registry tests + the 3 PR-touched files; no stale or unwired referencesgit grep "abortEmbeddedPiRun"returns only test-mock contexts (2 files); production paths have no stray referenceskillSessionRunis LIVE (abort or SIGTERM, not a stub)killedaggregatemarked > 0 || aborted || cleared...—abortedtransitioning from constant-false to possibly-true only ADDS truthy cases; no previously-true scenario regressesaction-kill.ts(subsequentstopSubagentsForRequestertargets children ofchildKey, notchildKeyitself;killSessionRunis idempotent anyway)markSteerRestart → killSessionRun → clearQueues → agent.wait → dispatchis race-free (all sync within one event-loop tick)@ts-expect-error/@ts-ignore/as anyintroduced (stub-debt gate zero-tolerance since chore(stub-debt): simplify gate to zero-tolerance and delete baseline — Bite C #2459)Test plan
pnpm checkexit 0pnpm testfull suite green (7010 / 800 files)Context
src/agents/session-run-registry.tsb27cecc795)Closes #2344
Refs: #2089
🤖 Generated with Claude Code