Session management performance degrades severely with subagent usage (100%+ CPU at ~400 sessions)

## Problem

OpenClaw natively supports and encourages subagent spawning for task parallelism, but session accumulation severely impacts host performance, forcing users to sacrifice data retention for operational stability.

## Real-World Metrics (from our deployment)

### Before (447 sessions accumulated over ~3 days)
| Metric | Value |
|--------|-------|
| `sessions.list` response time | **6.5 seconds** |
| Gateway CPU | **100-115%** sustained |
| Gateway RAM | **1+ GB** |
| Dashboard usability | Effectively frozen |
| Gateway restart | Did not help (sessions persist on disk) |
| Host reboot | Did not help |

### After (aggressive manual pruning to 23 sessions)
| Metric | Value |
|--------|-------|
| `sessions.list` response time | **760ms** |
| Gateway CPU | **1%** |
| Gateway RAM | **~400 MB** |
| Dashboard usability | Responsive |

### Session Breakdown (typical)
- **2-4** long-lived main/channel sessions (webchat, Slack)
- **~400+** ephemeral sessions (subagents, spawn-dispatch, cron jobs)

With dispatcher running every 10 minutes spawning 2 sessions each cycle, accumulation is ~12 sessions/hour × 24h = ~288 sessions/day.

## Root Cause

1. `sessions.list` serializes all sessions on every call — O(n) with expensive per-session work
2. Control UI/Dashboard polls `sessions.list` frequently (~every 7 seconds), even when not viewing Sessions tab
3. `archiveAfterMinutes` (default 60) doesn't keep pace with spawn rate
4. No pagination, caching, or incremental sync

## Current Workaround

We had to:
1. Manually archive old `.jsonl` transcripts
2. Run `openclaw sessions cleanup --enforce --fix-missing`
3. Set `archiveAfterMinutes: 120` (still aggressive)
4. **Accept losing troubleshooting history** to maintain stability

This is a poor tradeoff — we lose the ability to debug issues from hours ago.

## Suggested Improvements

1. **Pagination for `sessions.list`** — don't serialize 400+ sessions per request
2. **Reduce Control UI poll frequency** — or skip `sessions.list` when not on Sessions view
3. **Incremental sync** — send deltas instead of full list
4. **Tiered retention config** — separate policies for:
   - Main/channel sessions (keep days/weeks)
   - Subagent sessions (keep hours)
5. **Hard session count cap with LRU eviction** — auto-archive oldest ephemeral sessions when limit reached
6. **Background proactive cleanup** — don't wait for user to hit performance cliff
7. **Lazy loading** — defer full session metadata until requested

## Environment

- OpenClaw v2026.3.28
- macOS 15.7.2 (arm64), Apple Silicon
- Workload: Automated dispatcher spawning subagents for issue processing

## Impact

This is a significant reliability issue for anyone running automated subagent workflows. The system encourages subagent spawning but doesn't scale session management to match, creating a hidden performance cliff that's hard to diagnose and forces users to choose between history retention and system stability.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Session management performance degrades severely with subagent usage (100%+ CPU at ~400 sessions) #58534

Problem

Real-World Metrics (from our deployment)

Before (447 sessions accumulated over ~3 days)

After (aggressive manual pruning to 23 sessions)

Session Breakdown (typical)

Root Cause

Current Workaround

Suggested Improvements

Environment

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
`sessions.list` response time	6.5 seconds
Gateway CPU	100-115% sustained
Gateway RAM	1+ GB
Dashboard usability	Effectively frozen
Gateway restart	Did not help (sessions persist on disk)
Host reboot	Did not help

Metric	Value
`sessions.list` response time	760ms
Gateway CPU	1%
Gateway RAM	~400 MB
Dashboard usability	Responsive

Uh oh!

Session management performance degrades severely with subagent usage (100%+ CPU at ~400 sessions) #58534

Description

Problem

Real-World Metrics (from our deployment)

Before (447 sessions accumulated over ~3 days)

After (aggressive manual pruning to 23 sessions)

Session Breakdown (typical)

Root Cause

Current Workaround

Suggested Improvements

Environment

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions