Session lane starvation: followup drain monopolizes session lane, blocks inbound dispatch for 20-30min

## Bug: Followup drain monopolizes session lane — causes indefinite inbound dispatch stall

**Version:** 2026.3.23-2 (not present in 2026.3.13)

### Symptoms
- After every agent turn, new inbound Discord DMs and WhatsApp messages are silently queued for 20-30 minutes before processing
- `sessions_send` does not fix it (queues behind same backlog)
- Only `SIGUSR1` (gateway restart / `resetAllLanes()`) resolves it immediately
- 100% reproducible on any session with active followup queue + compaction-heavy context

### Root Cause
`scheduleFollowupDrain` (in `pi-embedded-CbCYZxIb.js:94509`) starts an unbounded async loop after every turn. Each queued item (system events, subagent announces, WhatsApp reconnects) calls `runEmbeddedPiAgent` → `enqueueSession(() => enqueueGlobal(...))`, holding the session lane (`maxConcurrent: 1`) for the full turn duration including compaction + context engine maintenance. New user messages queue behind all followup turns with no preemption.

### Observed lane wait times
From diagnostic logs on `session:agent:main:main`:

| Time (Mar 25) | Wait (ms) | Wait (min) |
|---|---|---|
| 10:28 | 1,361,250 | ~22 min |
| 13:42 | 1,814,033 | ~30 min |
| 13:49 | 1,237,693 | ~20 min |

Log pattern: `lane wait exceeded: waitedMs=1814033 queueAhead=1`

### Contributing factors
- **Compaction safeguard** runs summarization API calls within the lane task (context chronically at 142% of budget, compactionCount: 14)
- **Memory flush** adds a full LLM call per compaction cycle (`compaction.memoryFlush.enabled: true`)
- **Ollama heartbeat/cron timeouts** (5 min each) consume global lane slots and trigger retry chains with 404 fallback failures
- **WhatsApp reconnects** generate bursts of system events (observed: 152 disconnect/reconnect events in one day) that each get processed as individual followup turns

### Setup context
- `session.dmScope: "main"` (Discord DM + WhatsApp share main session)
- 10 agents configured, multiple WhatsApp groups, Discord guild with ~15 channels
- Heartbeat: every 55m (was on `ollama/qwen2.5-14b-agent`, timeouts blocked global lanes)
- `contextPruning.mode: "cache-ttl"` (custom events at end of turn correlated with stall, but turning it off did not fix it — lane starvation is the real cause)

### Suggested fixes
1. **Cap consecutive followup drain turns** (e.g., max 3) before yielding to inbound queue
2. **Prioritize user messages over system events** in the session lane
3. **Run context engine maintenance (`afterTurn`/`maintain`) OUTSIDE the session lane task** — the lane should be released after `clearActiveEmbeddedRun`, not after post-turn cleanup
4. **Add a configurable aggregate timeout** for the full session lane task (not just compaction retry)

### Workarounds (config-level mitigations)
These reduce lane occupation time but do not fix the root cause:
```json5
{
  agents: {
    defaults: {
      compaction: { reserveTokensFloor: 20000, memoryFlush: { enabled: false } },
      heartbeat: { model: "anthropic/claude-haiku-4-5" },  // was ollama (5min timeouts)
    },
  },
  messages: { queue: { debounceMs: 5000 } },  // batch system events
}
```

### Reproduction
1. Configure a session with `dmScope: "main"` and multiple active channels (WhatsApp + Discord)
2. Enable compaction safeguard with memory flush
3. Send a message → agent responds
4. Wait 3-5 minutes, send again
5. Message sits in lane queue for 20-30 minutes (or indefinitely with ollama timeouts)

### Environment
- macOS 12.7 (x64), Node v24.13.1
- OpenClaw 2026.3.23-2 (npm global install)
- Anthropic Claude Sonnet 4.6 (primary), Ollama Qwen 2.5 14B (heartbeat/crons)
- Discord + WhatsApp + Telegram channels active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Session lane starvation: followup drain monopolizes session lane, blocks inbound dispatch for 20-30min #54488

Bug: Followup drain monopolizes session lane — causes indefinite inbound dispatch stall

Symptoms

Root Cause

Observed lane wait times

Contributing factors

Setup context

Suggested fixes

Workarounds (config-level mitigations)

Reproduction

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time (Mar 25)	Wait (ms)	Wait (min)
10:28	1,361,250	~22 min
13:42	1,814,033	~30 min
13:49	1,237,693	~20 min

Uh oh!

Session lane starvation: followup drain monopolizes session lane, blocks inbound dispatch for 20-30min #54488

Description

Bug: Followup drain monopolizes session lane — causes indefinite inbound dispatch stall

Symptoms

Root Cause

Observed lane wait times

Contributing factors

Setup context

Suggested fixes

Workarounds (config-level mitigations)

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions