Skip to content

fix(subagents): wire kill and steer actions to session-run-registry killSessionRun #2344

@alexey-pelykh

Description

@alexey-pelykh

Summary

The subagent kill and steer actions (exposed via the subagents agent tool and via /subagents chat commands) lost their subprocess-termination semantics in the pi-embedded gut (b27cecc795, #76/#77). Kill no longer kills; steer no longer interrupts — it only waits for natural completion.

Tracked from the ChannelBridge compatibility audit on #2089.

Affected call sites

All three paths implement the same kill/steer semantics across different entry points:

Entry point File Kill call site Steer call site
Agent tool subagents src/agents/tools/subagents-tool.ts killSubagentRun line 251 (const aborted = false — dead var) createSubagentsTool steer branch, pre-gut line 592-595 (abortEmbeddedPiRun(sessionId) removed)
Text command /subagents kill src/auto-reply/reply/commands-subagents/action-kill.ts handleSubagentsKillAction pre-gut line 53-56 (abortEmbeddedPiRun(sessionId) removed) N/A
Text command /subagents send --steer src/auto-reply/reply/commands-subagents/action-send.ts N/A handleSubagentsSendAction pre-gut line 68-70 (abortEmbeddedPiRun(targetSessionId) removed)

All three files are currently GUTTED (no pi-embedded imports). They clear queues, mark run records as terminated in the subagent-registry, and for steer paths wait via callGateway({ method: "agent.wait", ... }) for the existing run to finish naturally — but none of them actually terminate the CLI subprocess.

Impact

Kill: A user or parent agent calling kill on a running subagent marks the registry entry as terminated and clears queues, but the CLI subprocess continues executing until natural completion (or timeout). killed: true is returned regardless. Tokens, API costs, and work continue.

Steer: A steer message is queued as a new run via the agent gateway method, and the existing run is left to finish naturally before the replacement starts (via agent.wait at subagents-tool.ts:634 and action-send.ts:75). In pre-gut behavior, the existing run was aborted first so the replacement started immediately with the combined context. Post-gut, the user-visible effect is "steer" queuing up a followup with a 5-second delay.

Proposed fix

All three files: Import and call killSessionRun from session-run-registry before the existing agent.wait / clearSessionQueues logic.

For subagents-tool.ts kill path (line 236-276 killSubagentRun):

import { killSessionRun } from "../session-run-registry.js";

async function killSubagentRun(params: { ... }): Promise<{ killed: boolean; sessionId?: string }> {
  if (params.entry.endedAt) {
    return { killed: false };
  }
  const childSessionKey = params.entry.childSessionKey;
  const resolved = resolveSessionEntryForKey({ ... });
  const aborted = killSessionRun(childSessionKey);  // ← wire to session-run-registry
  const cleared = clearSessionQueues([childSessionKey, resolved.entry?.sessionId]);
  // ... existing markSubagentRunTerminated logic
  const killed = marked > 0 || aborted || cleared.followupCleared > 0 || cleared.laneCleared > 0;
  return { killed, sessionId: resolved.entry?.sessionId };
}

For subagents-tool.ts steer path (line 608-644):

// Before the existing callGateway({ method: "agent.wait", ... }) block at line 632-643:
if (resolved.entry.childSessionKey) {
  killSessionRun(resolved.entry.childSessionKey);
}
// Keep the existing agent.wait as a settlement step so the replacement starts
// after the aborted run has fully unwound.
try {
  await callGateway({
    method: "agent.wait",
    params: { runId: resolved.entry.runId, timeoutMs: STEER_ABORT_SETTLE_TIMEOUT_MS },
    timeoutMs: STEER_ABORT_SETTLE_TIMEOUT_MS + 2_000,
  });
} catch { /* continue */ }

For action-kill.ts (line 49-53, post-gut):

import { killSessionRun } from "../../../agents/session-run-registry.js";

// In handleSubagentsKillAction, after resolving entry:
if (childKey) {
  killSessionRun(childKey);
}
const cleared = clearSessionQueues([childKey, entry?.sessionId]);

For action-send.ts steer path (line 64-86):

import { killSessionRun } from "../../../agents/session-run-registry.js";

if (steerRequested) {
  markSubagentRunForSteerRestart(targetResolution.entry.runId);
  killSessionRun(targetResolution.entry.childSessionKey);  // ← wire here
  const cleared = clearSessionQueues([targetResolution.entry.childSessionKey, targetSessionId]);
  // ... existing agent.wait settlement
}

Key-type conversion note

session-run-registry keys on sessionKey (the canonical ChannelBridge session identifier). The subagent entry has childSessionKey (a SubagentRunRecord field) which is exactly that canonical key — no conversion needed. Contrast with entry.sessionId which is the Pi-era UUID still stored in SessionEntry and should be ignored for registry lookups.

Acceptance Criteria

  • subagents-tool.ts kill path calls killSessionRun(childSessionKey) and removes the const aborted = false dead var
  • subagents-tool.ts steer path calls killSessionRun before the agent.wait settlement step
  • action-kill.ts calls killSessionRun(childKey) before clearing queues
  • action-send.ts steer branch calls killSessionRun before the agent.wait settlement step
  • pnpm check passes
  • Existing kill/steer tests updated (see src/agents/subagent-registry.steer-restart.test.ts and src/agents/remoteclaw-tools.subagents.*.test.ts)
  • Live smoke test: spawn a subagent with a long-running task and verify kill actually terminates the subprocess (check ps or runtime log)

Out of Scope

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions