ACP oneshot completion callback never fires — lifecycle event race condition
Summary
When sessions_spawn creates an ACP oneshot (mode: "run") session, the completion callback never wakes the parent agent. The spawned task completes successfully, but the parent session is never notified.
Environment
- OpenClaw
2026.3.22
- acpx
0.1.16
- ACP backend:
acpx, defaultAgent: claude
- Node 22.22.1, macOS (arm64)
Steps to Reproduce
- From a main agent session, call
sessions_spawn with runtime: "acp", mode: "run", and a simple task
- Call
sessions_yield to wait for the completion callback
- The ACP task completes (file is created, session entry shows
status: "done")
- The parent session is never woken — no completion event arrives
Root Cause
Race condition in the spawn flow between callGateway and registerSubagentRun:
In pi-embedded-CzQCqSlH.js, the spawn handler does:
callGateway({ method: "agent", ... }) — starts the child agent run
registerSubagentRun({ ... }) — registers the run in the subagent registry
For ACP oneshot sessions, callGateway("agent") resolves through agentCommandInternal → acpResolution.kind === "ready" → acpManager.runTurn(), which runs the entire ACP turn synchronously. The turn completes, and emitAgentEvent({ stream: "lifecycle", data: { phase: "end" } }) fires before callGateway returns.
The lifecycle listener in ensureListener() handles the "end" event:
const entry = subagentRuns.get(evt.runId);
if (!entry) {
if (phase === "end" && typeof evt.sessionKey === "string")
await refreshFrozenResultFromSession(evt.sessionKey);
return; // <-- event dropped, no announce triggered
}
Since registerSubagentRun hasn't been called yet, there's no entry in subagentRuns. The fallback refreshFrozenResultFromSession also finds nothing (no pending completion runs). The lifecycle event is silently dropped.
By the time registerSubagentRun executes (after callGateway returns), the lifecycle event has already fired and won't fire again. The run is registered but never completed — waitForSubagentCompletion waits forever (or until timeout).
For regular (non-ACP) subagent sessions, this race doesn't manifest because the initial callGateway("agent") returns quickly (the agent run is async), giving registerSubagentRun time to execute before the lifecycle "end" event fires.
Expected Behavior
The parent session should receive a completion callback when the ACP oneshot task finishes, regardless of whether the turn completes synchronously.
Suggested Fix
Move registerSubagentRun to before the callGateway("agent") call (with appropriate rollback on failure), so the lifecycle listener has a registered entry when the "end" event fires. Alternatively, add a deferred completion check: after registerSubagentRun, check if the agent run already completed and trigger the announce path if so.
Additional Context
- The
agents.list config is also required — without claude/codex entries, the ACP identity reconcile fails on every restart with "checked=2 resolved=0 failed=2". This is a prerequisite but separate issue.
runs.json (subagent registry on disk) shows empty {} after the spawn, confirming the run was never properly tracked through completion.
- The "Done." output from the ACP turn IS logged in
gateway.log, confirming the turn itself succeeds.
ACP oneshot completion callback never fires — lifecycle event race condition
Summary
When
sessions_spawncreates an ACP oneshot (mode: "run") session, the completion callback never wakes the parent agent. The spawned task completes successfully, but the parent session is never notified.Environment
2026.3.220.1.16acpx, defaultAgent:claudeSteps to Reproduce
sessions_spawnwithruntime: "acp",mode: "run", and a simple tasksessions_yieldto wait for the completion callbackstatus: "done")Root Cause
Race condition in the spawn flow between
callGatewayandregisterSubagentRun:In
pi-embedded-CzQCqSlH.js, the spawn handler does:callGateway({ method: "agent", ... })— starts the child agent runregisterSubagentRun({ ... })— registers the run in the subagent registryFor ACP oneshot sessions,
callGateway("agent")resolves throughagentCommandInternal→acpResolution.kind === "ready"→acpManager.runTurn(), which runs the entire ACP turn synchronously. The turn completes, andemitAgentEvent({ stream: "lifecycle", data: { phase: "end" } })fires beforecallGatewayreturns.The lifecycle listener in
ensureListener()handles the "end" event:Since
registerSubagentRunhasn't been called yet, there's no entry insubagentRuns. The fallbackrefreshFrozenResultFromSessionalso finds nothing (no pending completion runs). The lifecycle event is silently dropped.By the time
registerSubagentRunexecutes (aftercallGatewayreturns), the lifecycle event has already fired and won't fire again. The run is registered but never completed —waitForSubagentCompletionwaits forever (or until timeout).For regular (non-ACP) subagent sessions, this race doesn't manifest because the initial
callGateway("agent")returns quickly (the agent run is async), givingregisterSubagentRuntime to execute before the lifecycle "end" event fires.Expected Behavior
The parent session should receive a completion callback when the ACP oneshot task finishes, regardless of whether the turn completes synchronously.
Suggested Fix
Move
registerSubagentRunto before thecallGateway("agent")call (with appropriate rollback on failure), so the lifecycle listener has a registered entry when the "end" event fires. Alternatively, add a deferred completion check: afterregisterSubagentRun, check if the agent run already completed and trigger the announce path if so.Additional Context
agents.listconfig is also required — withoutclaude/codexentries, the ACP identity reconcile fails on every restart with"checked=2 resolved=0 failed=2". This is a prerequisite but separate issue.runs.json(subagent registry on disk) shows empty{}after the spawn, confirming the run was never properly tracked through completion.gateway.log, confirming the turn itself succeeds.