Skip to content

fix(subagents): wire kill and steer to killSessionRun — actually terminate CLI subprocesses#2461

Merged
alexey-pelykh merged 1 commit intomainfrom
fix/2344-subagents-kill-session-run
Apr 22, 2026
Merged

fix(subagents): wire kill and steer to killSessionRun — actually terminate CLI subprocesses#2461
alexey-pelykh merged 1 commit intomainfrom
fix/2344-subagents-kill-session-run

Conversation

@alexey-pelykh
Copy link
Copy Markdown

Summary

The subagent kill and steer actions (exposed via the subagents agent tool and /subagents kill / /subagents steer chat commands) lost their subprocess-termination semantics during the pi-embedded gut (b27cecc795, #76/#77):

  • Kill: marked the registry entry as terminated and cleared queues, but the CLI subprocess kept running until natural completion. Tokens, API costs, and work continued even though killed: true was returned.
  • Steer: queued a replacement run via agent, then waited via agent.wait for the existing run to finish naturally before the replacement started. Pre-gut behavior aborted the existing run first so the replacement started immediately with combined context. Post-gut, "steer" was effectively a followup with a 5-second delay.

This PR wires killSessionRun (from the already-live src/agents/session-run-registry.ts, populated by ChannelBridge) into all four affected call sites.

Changes

File Site Change
src/agents/tools/subagents-tool.ts kill (killSubagentRun, line 252) Replace const aborted = false; dead var with const aborted = killSessionRun(childSessionKey);
src/agents/tools/subagents-tool.ts steer (line ~624) Insert killSessionRun(resolved.entry.childSessionKey); before clearSessionQueues / agent.wait settlement
src/auto-reply/reply/commands-subagents/action-kill.ts /subagents kill Insert killSessionRun(childKey); between loadSubagentSessionEntry and clearSessionQueues
src/auto-reply/reply/commands-subagents/action-send.ts /subagents steer branch Insert killSessionRun(targetResolution.entry.childSessionKey); between markSubagentRunForSteerRestart and clearSessionQueues

Plus three imports of killSessionRun.

cascadeKillChildren propagates the fix automatically — it delegates to killSubagentRun, which now terminates the subprocess.

Why killSessionRun is the right replacement

  • Live, not a stub: session-run-registry.ts:65-83 dispatches either abortController.abort() OR process.kill(pid, "SIGTERM"). Module attestation is "live".
  • Actively populated: ChannelBridge registers at channel-bridge.ts:160 (in a try) and unregisters at :349 (in a finally). CLI subprocess runtime registers pid at dispatch time.
  • Idempotent + safe: returns false on unknown/already-aborted keys; internal try/catch around process.kill means it never throws.
  • Key-type match: childSessionKey (from SubagentRunRecord) is the canonical ChannelBridge sessionKey the registry keys on. No conversion needed. entry.sessionId (Pi-era UUID) is preserved for clearSessionQueues / session-store lookups only.

Out of scope (per issue body)

Verification

  • pnpm check (format + tsgo + lint + lint:tmp + lint:no-remoteclaw-ai) → exit 0
  • pnpm test (full parallel suite) → 800 files / 7010 passed / 3 skipped — no regression
  • Rescan: git grep "killSessionRun" returns registry definition + registry tests + the 3 PR-touched files; no stale or unwired references
  • Rescan: git grep "abortEmbeddedPiRun" returns only test-mock contexts (2 files); production paths have no stray references
  • Adversarial validation (fresh-context subclaude, 6 AC + 11 adversarial checks): CLEAN. Key confirmations:
    • killSessionRun is LIVE (abort or SIGTERM, not a stub)
    • killed aggregate marked > 0 || aborted || cleared...aborted transitioning from constant-false to possibly-true only ADDS truthy cases; no previously-true scenario regresses
    • No double-kill in action-kill.ts (subsequent stopSubagentsForRequester targets children of childKey, not childKey itself; killSessionRun is idempotent anyway)
    • Order markSteerRestart → killSessionRun → clearQueues → agent.wait → dispatch is race-free (all sync within one event-loop tick)
    • Import sort matches oxfmt convention; zero new @ts-expect-error / @ts-ignore / as any introduced (stub-debt gate zero-tolerance since chore(stub-debt): simplify gate to zero-tolerance and delete baseline — Bite C #2459)

Test plan

  • pnpm check exit 0
  • pnpm test full suite green (7010 / 800 files)
  • Rescans clean
  • CI green on all 11 required checks

Context

Closes #2344

Refs: #2089

🤖 Generated with Claude Code

…inate CLI subprocesses (#2344)

The subagent `kill` and `steer` actions (exposed via the `subagents`
agent tool and `/subagents kill` / `/subagents steer` chat commands)
lost their subprocess-termination semantics during the pi-embedded gut
(`b27cecc795`, #76/#77). Kill marked the registry entry as terminated
and cleared queues but the CLI subprocess kept running until natural
completion; steer queued a replacement run but left the existing run
to finish first (via `agent.wait` settlement) instead of interrupting
it.

Fix: call `killSessionRun` from `src/agents/session-run-registry.ts`
before the queue-clear / settlement logic at all four affected call
sites. The registry is already populated by `ChannelBridge` (register
at `channel-bridge.ts:160` in a `try`, unregister at `:349` in a
`finally`), and `killSessionRun` dispatches either
`abortController.abort()` or `process.kill(pid, "SIGTERM")`. It is
idempotent (returns `false` on unknown/already-aborted keys — safe
no-op) and cannot throw (internal `try/catch` around `process.kill`).

Changes:
- `src/agents/tools/subagents-tool.ts`:
  - Import `killSessionRun` from `../session-run-registry.js`.
  - `killSubagentRun` (kill path): replace `const aborted = false;`
    dead var with `const aborted = killSessionRun(childSessionKey);`.
    The outer `killed` aggregate (`marked > 0 || aborted || cleared...`)
    now correctly reflects when a kill signal was dispatched. Previously
    `aborted` was a no-op disjunct; this only ADDS truthy cases, so no
    previously-true scenario regresses.
  - Steer path (line ~624): insert
    `killSessionRun(resolved.entry.childSessionKey);` between the
    `sessionId` resolution and `clearSessionQueues`, so the existing
    run is aborted before the `agent.wait` settle-timeout polls. The
    existing `agent.wait` stays as a 5-second settlement window.
  - `cascadeKillChildren` propagates automatically — it delegates to
    `killSubagentRun`.
- `src/auto-reply/reply/commands-subagents/action-kill.ts`: import +
  `killSessionRun(childKey);` inserted between `loadSubagentSessionEntry`
  and `clearSessionQueues`.
- `src/auto-reply/reply/commands-subagents/action-send.ts`: import +
  `killSessionRun(targetResolution.entry.childSessionKey);` in the
  `steerRequested` branch between `markSubagentRunForSteerRestart` and
  `clearSessionQueues`.

Key-type correctness: `childSessionKey` (from `SubagentRunRecord`) is
the canonical ChannelBridge sessionKey that the registry keys on.
`entry.sessionId` (Pi-era UUID) is NOT used for registry lookups —
preserved for `clearSessionQueues` and `loadSubagentSessionEntry` only.

Out of scope (separate issues per #2344 body):
- `abort.ts::stopSubagentsForRequester` carries the same
  `const aborted = false;` dead-var pattern — tracked under the `/abort`
  user command + gateway `sessions.reset` sub-issue.
- Auto-reply queue mode and `abortEmbeddedPiRun` test-mock cleanup —
  separate sub-issues.
- `pi-embedded.ts` stub barrel deletion — tracked under #2089 cascade
  sweep.

Verification:
- `pnpm check` (format + tsgo + lint + project-specific lints) → exit 0
- `pnpm test` (full parallel suite) → **800 files / 7010 passed /
  3 skipped** — no regression
- Rescan: `git grep "killSessionRun"` returns registry definition +
  registry tests + the 3 files in this PR; no stale/unwired references
- Rescan: `git grep "abortEmbeddedPiRun"` returns only test-mock
  contexts (`reply.block-streaming.test.ts`, `reply/abort.test.ts`) —
  production paths have no stray references
- Adversarial validation (fresh-context subclaude): CLEAN verdict on
  6 AC + 11 adversarial checks — confirms `killSessionRun` is LIVE
  (abortController/SIGTERM), key-type match correct, no double-kill in
  `action-kill.ts → stopSubagentsForRequester` path (latter targets
  grandchildren, not the killed session), order `mark → kill → clear →
  wait` is race-free (all sync within a tick)

Closes #2344

Refs: #2089

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-pelykh alexey-pelykh merged commit 8d0096c into main Apr 22, 2026
15 checks passed
@alexey-pelykh alexey-pelykh deleted the fix/2344-subagents-kill-session-run branch April 22, 2026 10:48
alexey-pelykh added a commit that referenced this pull request Apr 22, 2026
…ctually terminate CLI subprocesses (#2343)

Replace three dead `const aborted = false` / `const ended = true`
placeholders left behind by the Pi-embedded gut (`b27cecc795`, #76/#77)
with real calls to `killSessionRun` / `waitForSessionRunEnd` from
`src/agents/session-run-registry.ts`:

- `abort.ts::stopSubagentsForRequester` inner loop — `killSessionRun(childKey)`
  now flows into the `killed` aggregate, so the loop actually counts
  subagent runs whose subprocesses were signaled.
- `abort.ts::tryFastAbortFromMessage` — `killSessionRun(resolvedTargetKey)`
  returned in the `aborted` field (previously always `false`). User-facing
  `/abort` now terminates the running CLI subprocess for non-ACP sessions;
  ACP cancellation path remains separate.
- `sessions.ts::ensureSessionRuntimeCleanup` — replaces the `const ended = true`
  placeholder with `killSessionRun(canonicalKey)` + `waitForSessionRunEnd`
  (15s timeout). Returns `UNAVAILABLE` on timeout, restoring the pre-gut
  contract that `sessions.reset` / `sessions.delete` surface the error
  when a subprocess refuses to end.

Also removes the `if (!params.sessionId) return undefined;` early-exit
in `ensureSessionRuntimeCleanup`: `sessionId` is used only to extend the
queue-key set; termination operates on `canonicalKey`, which is always
present. Callers with no `sessionId` now correctly attempt termination
(`waitForSessionRunEnd` short-circuits to `true` when no run is
registered, so the wait has no cost when there's nothing to wait for).

Polish: extracted `15_000` to `SESSION_RUN_TERMINATION_TIMEOUT_MS`,
grouped with the sibling `ACP_RUNTIME_CLEANUP_TIMEOUT_MS` constant.

Issue body lists three regression sites; only two apply. The third
(`commands-session.ts::applyAbortTarget`) does not exist in the current
tree — `git grep applyAbortTarget` returns zero hits. The `/session
abort` pathway was consolidated into `tryFastAbortFromMessage` at some
earlier point, so no duplicate wiring is needed. AC3 marked N/A.

Verification:
- `pnpm check` → exit 0
- `pnpm test` (full parallel suite) → 800 files / 7010 passed /
  3 skipped — no regression
- Rescan: `git grep "const aborted = false"` → zero hits across the
  whole tree (combined sweep of #2344 subagent kill/steer + #2343
  abort/subagent-cascade/fastAbort)
- Rescan: `git grep "abortEmbeddedPiRun"` → only test-mock contexts;
  production has no remaining references (test-mock cleanup tracked
  under #2089 sweep)
- Adversarial validation (fresh-context subclaude): CLEAN on 7 AC +
  9 adversarial checks — confirms waitForSessionRunEnd is LIVE (50ms
  poll tick, max 15s wall-clock), `ensureSessionRuntimeCleanup` callers
  always pass a valid `canonicalKey`, and no test asserts `aborted ===
  false` on the fast-abort path

Closes #2343

Refs: #2089, #2344, #2460, #2461

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-pelykh added a commit that referenced this pull request Apr 22, 2026
…ctually terminate CLI subprocesses (#2343) (#2462)

Replace three dead `const aborted = false` / `const ended = true`
placeholders left behind by the Pi-embedded gut (`b27cecc795`, #76/#77)
with real calls to `killSessionRun` / `waitForSessionRunEnd` from
`src/agents/session-run-registry.ts`:

- `abort.ts::stopSubagentsForRequester` inner loop — `killSessionRun(childKey)`
  now flows into the `killed` aggregate, so the loop actually counts
  subagent runs whose subprocesses were signaled.
- `abort.ts::tryFastAbortFromMessage` — `killSessionRun(resolvedTargetKey)`
  returned in the `aborted` field (previously always `false`). User-facing
  `/abort` now terminates the running CLI subprocess for non-ACP sessions;
  ACP cancellation path remains separate.
- `sessions.ts::ensureSessionRuntimeCleanup` — replaces the `const ended = true`
  placeholder with `killSessionRun(canonicalKey)` + `waitForSessionRunEnd`
  (15s timeout). Returns `UNAVAILABLE` on timeout, restoring the pre-gut
  contract that `sessions.reset` / `sessions.delete` surface the error
  when a subprocess refuses to end.

Also removes the `if (!params.sessionId) return undefined;` early-exit
in `ensureSessionRuntimeCleanup`: `sessionId` is used only to extend the
queue-key set; termination operates on `canonicalKey`, which is always
present. Callers with no `sessionId` now correctly attempt termination
(`waitForSessionRunEnd` short-circuits to `true` when no run is
registered, so the wait has no cost when there's nothing to wait for).

Polish: extracted `15_000` to `SESSION_RUN_TERMINATION_TIMEOUT_MS`,
grouped with the sibling `ACP_RUNTIME_CLEANUP_TIMEOUT_MS` constant.

Issue body lists three regression sites; only two apply. The third
(`commands-session.ts::applyAbortTarget`) does not exist in the current
tree — `git grep applyAbortTarget` returns zero hits. The `/session
abort` pathway was consolidated into `tryFastAbortFromMessage` at some
earlier point, so no duplicate wiring is needed. AC3 marked N/A.

Verification:
- `pnpm check` → exit 0
- `pnpm test` (full parallel suite) → 800 files / 7010 passed /
  3 skipped — no regression
- Rescan: `git grep "const aborted = false"` → zero hits across the
  whole tree (combined sweep of #2344 subagent kill/steer + #2343
  abort/subagent-cascade/fastAbort)
- Rescan: `git grep "abortEmbeddedPiRun"` → only test-mock contexts;
  production has no remaining references (test-mock cleanup tracked
  under #2089 sweep)
- Adversarial validation (fresh-context subclaude): CLEAN on 7 AC +
  9 adversarial checks — confirms waitForSessionRunEnd is LIVE (50ms
  poll tick, max 15s wall-clock), `ensureSessionRuntimeCleanup` callers
  always pass a valid `canonicalKey`, and no test asserts `aborted ===
  false` on the fast-abort path

Closes #2343

Refs: #2089, #2344, #2460, #2461

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(subagents): wire kill and steer actions to session-run-registry killSessionRun

1 participant