Skip to content

subagent-announce-queue: collect-mode batching skipped when any queued item has an unresolved origin #83577

@gabrielexito-stack

Description

@gabrielexito-stack

Summary

In subagent-announce-queue, collect mode is supposed to batch multiple queued announces addressed to the same channel into a single [Queued announce messages while agent was busy] payload via buildCollectPrompt. In practice, when a burst of ≥2 sibling subagent completions enqueues while the requester session is busy, the batching path never executes — items drain one-by-one, each re-enters the per-item defer loop, and once MAX_DEFER_WHILE_BUSY_MS=15000ms exhausts they get fired into the still-busy session where they are steered/dropped silently.

Root cause: hasCrossChannelItems treats an unkeyed item as evidence of cross-channel mixing, then drainCollectItemIfNeeded sets a sticky forceIndividualCollect=true that lasts the rest of the drain. A single item whose origin fails to normalizeDeliveryContext (returns undefined) poisons batching for every other item in the queue.

Real-world incident: 2026-05-18 roundtable with 4 panel subagents — 1 of 4 announces reached the orchestrator's DM session, 3 silently dropped.

Related architectural context: #66638 (broader feature request about decoupling subagent notifications from session-lane batching). This issue is the bug-level companion — two specific defects inside the current architecture that, once fixed, recover the intended collect-mode behavior even before that larger redesign.

Environment

  • macOS, node ~/.local/lib/node_modules/openclaw/
  • Installed version: v2026.5.12 (current latest)
  • Service manager: launchd (ai.openclaw.gateway)
  • Mode: agents.defaults.subagents.announceQueue.mode = "collect"

Reproduction

  1. Configure announce queue in collect mode.
  2. Spawn N≥3 sibling subagents from a session whose lastTo is empty/unresolvable (steady-state DM where session is keyed by accountId).
  3. Have the orchestrator emit short turns 10–25s apart so requesterActivity.isActive flips true/false during the burst.
  4. Observe: only the first announce (the one enqueued during an idle gap) reaches the orchestrator's session jsonl as a user-role event. The remainder are reported delivered: true by sendAnnounce but never appear in the transcript.

Error / Stack

No exception thrown — this is a silent semantic bug. Evidence trail:

  • ~/.openclaw/agents/main/sessions/<id>.jsonl shows only 1 of 4 announces.
  • ~/.openclaw/logs/gateway-watchdog.log shows announce agent calls accepted.
  • All 4 child sessions show completion + announce enqueue events.

Root cause

Site 1 — src/utils/queue-helpers.ts (compiled queue-helpers-DHvHrahl.js:114-128)

function hasCrossChannelItems(items, resolveKey) {
    const keys = new Set();
    let hasUnkeyed = false;
    for (const item of items) {
        const resolved = resolveKey(item);
        if (resolved.cross) return true;
        if (!resolved.key) { hasUnkeyed = true; continue; }
        keys.add(resolved.key);
    }
    if (keys.size === 0) return false;
    if (hasUnkeyed) return true;        // ← false-positive when a SINGLE item is unkeyed
    return keys.size > 1;
}

The caller in src/agents/subagent-announce-queue.ts (compiled subagent-announce-queue-Dz5J_UzW.js:61-66) marks an item unkeyed when normalizeDeliveryContext(item.origin) returns undefined — which is normal for direct-DM sessions where to isn't carried explicitly. Treating an unkeyed item as cross-channel poisons the entire batch.

Site 2 — src/utils/queue-helpers.ts (compiled queue-helpers-DHvHrahl.js:79-83)

async function drainCollectItemIfNeeded(params) {
    if (!params.forceIndividualCollect && !params.isCrossChannel) return "skipped";
    if (params.isCrossChannel) params.setForceIndividualCollect?.(true);
    return await drainNextQueueItem(params.items, params.run) ? "drained" : "empty";
}

forceIndividualCollect is sticky across iterations (collectState is initialized once outside the loop at subagent-announce-queue-Dz5J_UzW.js:82). One false positive permanently degrades the queue to individual-drain mode for its lifetime.

Suggested fix

Two-part patch:

Part A — treat unkeyed-but-non-cross items as joinable with any group:

function hasCrossChannelItems(items, resolveKey) {
    const keys = new Set();
    for (const item of items) {
        const resolved = resolveKey(item);
        if (resolved.cross) return true;
        if (resolved.key) keys.add(resolved.key);
    }
    return keys.size > 1;
}

Unkeyed items become "join whatever the rest decides." If everything else is single-channel, batching proceeds. If there are ≥2 distinct keys, the unkeyed item can be drained individually as part of the resulting auth-group split (splitCollectItemsByAuthorization already groups by stable key).

Part B — recompute forceIndividualCollect per iteration instead of making it sticky:

// in scheduleAnnounceDrain, move `collectState` initialization INSIDE the loop:
for (;;) {
    if (queue.items.length === 0 && queue.droppedCount === 0) break;
    const collectState = { forceIndividualCollect: false };  // ← move here
    await waitForQueueDebounce(queue);
    // ...rest unchanged
}

Stickiness was likely intended to avoid re-evaluation cost, but hasAnnounceCrossChannelItems is O(n) over a typically-tiny queue — the saving isn't worth the silent-drop blast radius.

Verification

Confirmed on 2026.5.12 against both pre- and post-patch bundles via a 5-case node-level test of hasCrossChannelItems:

Case Items Pre-patch Post-patch Expected
1 2 keyed same false false false
2 1 unkeyed + 2 keyed same true (bug) false false
3 2 distinct keys true true true
4 1 keyed + 1 explicit cross true true true
5 2 unkeyed false false false

Case 2 is the reproducible bug shape. Pre-patch returns true, falsely tagging a same-channel batch as cross-channel and poisoning collect mode.

Workaround

Set agents.defaults.subagents.announceQueue.mode = "queue" (not collect). Loses the batching summary, but guarantees individual announces still reach the session. Or set parallelism cap to ≤2 at the spawning skill layer.

Severity

High. Multi-panel roundtables, multi-tool concurrent jobs, and any fan-out workflow with shared orchestrator session is affected. Silent loss (returns delivered: true) makes this invisible until someone reads the session jsonl by hand.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions