Skip to content

Sub-agent announcements silently dropped on gateway timeout (hardcoded 60s, no retry) #17000

@luisecab

Description

@luisecab

Bug

Sub-agent task results are silently lost when the announcement delivery to the parent session exceeds the hardcoded 60-second gateway timeout. The sub-agent completes successfully, but the user never sees the result.

Evidence

Gateway logs showing repeated announce failures:

Feb 15 02:20:47: Subagent announce failed: Error: gateway timeout after 60000ms
Feb 14 18:52:22: Subagent announce failed: Error: gateway timeout after 60000ms
Feb 13 12:51:39: announce queue drain failed for agent:main:main: Error: gateway timeout after 60000ms
Feb 13 12:55:45: announce queue drain failed for agent:main:main: Error: gateway timeout after 60000ms

Root Cause

src/agents/subagent-announce.tssendAnnounce() calls callGateway(... timeoutMs: 60_000 ...) — hardcoded, no retry, no backoff. On first timeout the announcement is silently dropped.

Steps to Reproduce

  1. Spawn a sub-agent via sessions_spawn during moderate gateway load (e.g., concurrent heartbeats, WhatsApp reconnects, multiple sessions active)
  2. Sub-agent completes its work successfully
  3. Announcement delivery times out at 60s
  4. User never receives the result — no error shown, no retry attempted
  5. Sub-agent session gets cleaned up, leaving no trace

Impact

  • Users think sub-agents "forgot" to report back
  • Results are lost despite successful execution
  • No indication to the user that announcement failed
  • Transcript still exists but user has no way to know to check it

Expected Behavior

  • Announcement delivery should retry with exponential backoff (at least 3 attempts)
  • Timeout should be configurable via agents.defaults.subagents.announceTimeoutMs (or similar)
  • Failed announcements should be logged visibly and queued for retry
  • Completed sub-agent sessions should persist longer for recovery

Suggested Fix

  1. Make timeout configurable: agents.defaults.subagents.announceTimeoutMs (default: 120000)
  2. Add retry with backoff: 3 retries at 60s/120s/240s before giving up
  3. Persist on failure: If all retries fail, store the announcement payload for manual recovery via /subagents log
  4. Surface failures: Show a visible warning to the user when announce fails (e.g., "⚠️ Sub-agent completed but delivery failed — use /subagents log <id> to view results")

Workarounds

  • Lower agents.defaults.subagents.maxConcurrent to reduce gateway contention
  • Manually check results via /subagents list + /subagents log <id>
  • Re-trigger announcement by asking the main agent to check sessions_history for the sub-agent session

Environment

  • OpenClaw v2026.2.14
  • Channel: WhatsApp
  • OS: Ubuntu Linux (AWS EC2, m7i-flex.large)
  • Config: maxConcurrent=4, subagents.maxConcurrent=8

Related

Confirmed by community member in Discord — code path identified in src/agents/subagent-announce.ts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions