Skip to content

[Bug]: Subagent completion delivery can be lost on direct-announce timeout, drain, or orphan prune #67777

@100yenadmin

Description

@100yenadmin

Problem

Subagent completion delivery currently prioritizes a synchronous direct announce back into the requester session. Under busy-lane, timeout, drain/restart, or restore/orphan conditions, the completion can fail direct delivery, fail conditional queue fallback, and then be cleaned up without a durable user-visible inbox item.

Confirmed behavior

  • completion is produced in subagent registry, not plugin layer
  • completions use direct-first delivery for expectsCompletionMessage=true
  • direct delivery fails in the gateway agent callback path under timeout/drain
  • queue fallback is conditional and can return no queue action
  • restored runs can be pruned as missing-session-entry before durable delivery occurs

Impact

Successful subagent work can appear lost, encouraging duplicate effort and reducing trust in completion delivery.

Requested outcome

Please use this issue as the planning thread, not a PR. We are out of PR budget right now.

Need:

  1. concrete coding plan
  2. exact patch surfaces/functions
  3. minimum survivable design for durable completion inbox/spool
  4. restart/drain short-circuit strategy
  5. restore/orphan preservation strategy

I will post the code changes separately later if needed.

Context

We already traced likely fix surfaces in the subagent registry / cleanup path. An agent will add a concrete patch plan as a comment on this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions