Bug
When the gateway crashes or restarts, sessions.json loses track of session files on disk. These orphaned .jsonl and .trajectory.jsonl files accumulate indefinitely, contributing to event loop delay (ELD) bloat and creating a self-reinforcing death spiral.
Environment
- OpenClaw: 2026.5.3-1 (2eae30e)
- OS: Linux 6.8.0-110-generic (x64) · node 22.22.2
- Agent:
main with heavy subagent usage (8-10 spawns/day)
Observed behavior
After a gateway crash → restart cycle (PID 2667517 → 339007 → 343385, 3 restarts in 4 minutes):
Indexed sessions in sessions.json: 53 entries
UUID-named .jsonl files on disk: 3,266 files
Orphaned (on disk, not in index): 3,214 files ← 98.4% untracked
Trajectory files on disk: 5,942 files
Dated orphan files (2026-*): 75 files
Total orphan size: 2.3GB
Of which >7d old: 2,946 files (773MB)
After the crash cycle, agent cleanup timed out appeared for pi-trajectory-flush (10s timeout), meaning files written during the failing gateway process were never registered in the index.
Root cause chain
Gateway crash/restart
→ New gateway PID starts, sessions.json may be in inconsistent state
→ Session files from old PID are on disk but not in the index
→ openclaw sessions cleanup only operates on indexed entries
→ Orphaned files accumulate forever
→ ELD grows (scanning 9K+ files → 4,272ms max observed)
→ More gateway timeouts → more crashes → more orphans
Expected behavior
Session files that exist on disk but are not referenced by sessions.json should be cleaned up, either:
- During startup (reconcile disk vs index)
- During periodic maintenance (sessions cleanup scans physical files too)
- With a configurable TTL for untracked files
Workaround
Manual cleanup script comparing sessions.json index against filesystem, removing orphans older than 7 days. Ran as daily cron job.
Related
Bug
When the gateway crashes or restarts,
sessions.jsonloses track of session files on disk. These orphaned.jsonland.trajectory.jsonlfiles accumulate indefinitely, contributing to event loop delay (ELD) bloat and creating a self-reinforcing death spiral.Environment
mainwith heavy subagent usage (8-10 spawns/day)Observed behavior
After a gateway crash → restart cycle (PID 2667517 → 339007 → 343385, 3 restarts in 4 minutes):
After the crash cycle,
agent cleanup timed outappeared forpi-trajectory-flush(10s timeout), meaning files written during the failing gateway process were never registered in the index.Root cause chain
Expected behavior
Session files that exist on disk but are not referenced by
sessions.jsonshould be cleaned up, either:Workaround
Manual cleanup script comparing
sessions.jsonindex against filesystem, removing orphans older than 7 days. Ran as daily cron job.Related