Skip to content

[bug] ACP oneshot completion callback never fires — lifecycle event race condition #52967

@bluk1020

Description

@bluk1020

ACP oneshot completion callback never fires — lifecycle event race condition

Summary

When sessions_spawn creates an ACP oneshot (mode: "run") session, the completion callback never wakes the parent agent. The spawned task completes successfully, but the parent session is never notified.

Environment

  • OpenClaw 2026.3.22
  • acpx 0.1.16
  • ACP backend: acpx, defaultAgent: claude
  • Node 22.22.1, macOS (arm64)

Steps to Reproduce

  1. From a main agent session, call sessions_spawn with runtime: "acp", mode: "run", and a simple task
  2. Call sessions_yield to wait for the completion callback
  3. The ACP task completes (file is created, session entry shows status: "done")
  4. The parent session is never woken — no completion event arrives

Root Cause

Race condition in the spawn flow between callGateway and registerSubagentRun:

In pi-embedded-CzQCqSlH.js, the spawn handler does:

  1. callGateway({ method: "agent", ... }) — starts the child agent run
  2. registerSubagentRun({ ... }) — registers the run in the subagent registry

For ACP oneshot sessions, callGateway("agent") resolves through agentCommandInternalacpResolution.kind === "ready"acpManager.runTurn(), which runs the entire ACP turn synchronously. The turn completes, and emitAgentEvent({ stream: "lifecycle", data: { phase: "end" } }) fires before callGateway returns.

The lifecycle listener in ensureListener() handles the "end" event:

const entry = subagentRuns.get(evt.runId);
if (!entry) {
    if (phase === "end" && typeof evt.sessionKey === "string")
        await refreshFrozenResultFromSession(evt.sessionKey);
    return;  // <-- event dropped, no announce triggered
}

Since registerSubagentRun hasn't been called yet, there's no entry in subagentRuns. The fallback refreshFrozenResultFromSession also finds nothing (no pending completion runs). The lifecycle event is silently dropped.

By the time registerSubagentRun executes (after callGateway returns), the lifecycle event has already fired and won't fire again. The run is registered but never completed — waitForSubagentCompletion waits forever (or until timeout).

For regular (non-ACP) subagent sessions, this race doesn't manifest because the initial callGateway("agent") returns quickly (the agent run is async), giving registerSubagentRun time to execute before the lifecycle "end" event fires.

Expected Behavior

The parent session should receive a completion callback when the ACP oneshot task finishes, regardless of whether the turn completes synchronously.

Suggested Fix

Move registerSubagentRun to before the callGateway("agent") call (with appropriate rollback on failure), so the lifecycle listener has a registered entry when the "end" event fires. Alternatively, add a deferred completion check: after registerSubagentRun, check if the agent run already completed and trigger the announce path if so.

Additional Context

  • The agents.list config is also required — without claude/codex entries, the ACP identity reconcile fails on every restart with "checked=2 resolved=0 failed=2". This is a prerequisite but separate issue.
  • runs.json (subagent registry on disk) shows empty {} after the spawn, confirming the run was never properly tracked through completion.
  • The "Done." output from the ACP turn IS logged in gateway.log, confirming the turn itself succeeds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions