Skip to content

Session management performance degrades severely with subagent usage (100%+ CPU at ~400 sessions) #58534

@lucca-alma

Description

@lucca-alma

Problem

OpenClaw natively supports and encourages subagent spawning for task parallelism, but session accumulation severely impacts host performance, forcing users to sacrifice data retention for operational stability.

Real-World Metrics (from our deployment)

Before (447 sessions accumulated over ~3 days)

Metric Value
sessions.list response time 6.5 seconds
Gateway CPU 100-115% sustained
Gateway RAM 1+ GB
Dashboard usability Effectively frozen
Gateway restart Did not help (sessions persist on disk)
Host reboot Did not help

After (aggressive manual pruning to 23 sessions)

Metric Value
sessions.list response time 760ms
Gateway CPU 1%
Gateway RAM ~400 MB
Dashboard usability Responsive

Session Breakdown (typical)

  • 2-4 long-lived main/channel sessions (webchat, Slack)
  • ~400+ ephemeral sessions (subagents, spawn-dispatch, cron jobs)

With dispatcher running every 10 minutes spawning 2 sessions each cycle, accumulation is ~12 sessions/hour × 24h = ~288 sessions/day.

Root Cause

  1. sessions.list serializes all sessions on every call — O(n) with expensive per-session work
  2. Control UI/Dashboard polls sessions.list frequently (~every 7 seconds), even when not viewing Sessions tab
  3. archiveAfterMinutes (default 60) doesn't keep pace with spawn rate
  4. No pagination, caching, or incremental sync

Current Workaround

We had to:

  1. Manually archive old .jsonl transcripts
  2. Run openclaw sessions cleanup --enforce --fix-missing
  3. Set archiveAfterMinutes: 120 (still aggressive)
  4. Accept losing troubleshooting history to maintain stability

This is a poor tradeoff — we lose the ability to debug issues from hours ago.

Suggested Improvements

  1. Pagination for sessions.list — don't serialize 400+ sessions per request
  2. Reduce Control UI poll frequency — or skip sessions.list when not on Sessions view
  3. Incremental sync — send deltas instead of full list
  4. Tiered retention config — separate policies for:
    • Main/channel sessions (keep days/weeks)
    • Subagent sessions (keep hours)
  5. Hard session count cap with LRU eviction — auto-archive oldest ephemeral sessions when limit reached
  6. Background proactive cleanup — don't wait for user to hit performance cliff
  7. Lazy loading — defer full session metadata until requested

Environment

  • OpenClaw v2026.3.28
  • macOS 15.7.2 (arm64), Apple Silicon
  • Workload: Automated dispatcher spawning subagents for issue processing

Impact

This is a significant reliability issue for anyone running automated subagent workflows. The system encourages subagent spawning but doesn't scale session management to match, creating a hidden performance cliff that's hard to diagnose and forces users to choose between history retention and system stability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions