Gateway becomes unresponsive under subagent load on Windows - completion announcements timeout

## Bug

The gateway becomes unresponsive when handling multiple subagent completion announcements, causing cascading timeouts and failed deliveries. This is distinct from #7042 (which was hard crashes) - the gateway stays alive but stops processing requests effectively.

## Symptoms

1. **Subagent completion timeouts** - repeated warnings:
\\\
[warn] Subagent announce completion direct announce agent call transient failure, retrying 2/4 in 5s: gateway timeout after 120000ms
[warn] Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 120000ms
\\\

2. **Provider profile failover** - Anthropic requests time out, triggering failover:
\\\
Profile anthropic:manual timed out. Trying next account...
embedded run failover decision: rotate_profile, failoverReason: timeout
\\\

3. **Lane wait exceeded** - sessions queue up waiting for processing:
\\\
lane wait exceeded: waitedMs=158509
\\\

4. Eventually the gateway stops responding to Telegram/channels entirely until restart.

## Impact

- Overnight cron jobs (subagent builds) fail to deliver results
- Channels go silent even though gateway process is still running
- Watchdog wrapper doesn't help because process hasn't crashed

## Environment

- OpenClaw \2026.4.5\
- Windows 10 (x64)
- Node.js v25.5.0
- Gateway as Windows Scheduled Task with watchdog wrapper
- Workload: overnight build cron spawning Sonnet subagents (30-60 min builds)

## Reproduction

1. Configure overnight cron job that spawns subagent builds
2. Run 2-3 builds back-to-back (e.g. overnight build + dream cycle)
3. Gateway progressively slows, completion announcements start timing out
4. Eventually gateway becomes unresponsive to all channels

## Logs

From \openclaw-2026-04-10.log\:
- 08:05:56 - First subagent timeout (retry 2/4)
- 08:06:31 - Anthropic profile timeout, failover triggered
- 08:08:01 - Subagent timeout (retry 3/4)
- Pattern continues with increasing delays

## Suggested Investigation

- Memory leak under sustained subagent load?
- WebSocket connection pool exhaustion?
- Anthropic API timeout handling blocking the event loop?

## Workaround

Currently none - manual gateway restart required once symptoms appear. The watchdog from #7042 only helps with crashes, not unresponsive states.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gateway becomes unresponsive under subagent load on Windows - completion announcements timeout #64253

Bug

Symptoms

Impact

Environment

Reproduction

Logs

Suggested Investigation

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Gateway becomes unresponsive under subagent load on Windows - completion announcements timeout #64253

Description

Bug

Symptoms

Impact

Environment

Reproduction

Logs

Suggested Investigation

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions