Summary
We are observing consistent performance issues in OpenClaw related to session handling.
There are two related symptoms:
sessions.list consistently takes about 10 to 16 seconds under moderate session load.
pi-trajectory-flush regularly times out at exactly 10000 ms.
Local cleanup and pruning improve disk usage and general stability, but do not resolve the core latency.
Environment
- OpenClaw: current local Docker deployment
- Image:
openclaw:local
- Platform: Debian on ARM64 / Raspberry Pi
- Storage: local filesystem
- Deployment mode: Docker
- Session store:
agents/main/sessions/sessions.json
Observed state before local cleanup and pruning:
sessions.json: about 4.1 MB
- Session entries: about 153
- Active session directory had many trajectory and session artifact files
- Trajectory files totaled several hundred MB
After cleanup and limiting active trajectories, sessions.list still remained around 10 seconds.
Problem 1: sessions.list latency
The sessions.list command consistently takes around 10 to 16 seconds.
Observed log examples:
⇄ res ✓ sessions.list 10186ms
⇄ res ✓ sessions.list 10230ms
⇄ res ✓ sessions.list 15978ms
⇄ res ✓ sessions.list 16379ms
This continued even after:
- removing stale
plugin-runtime-deps
- deleting stale SQLite temporary files
- deleting stale session temporary files
- archiving large trajectory files
- limiting active trajectory files to about 200
- archiving session artifacts such as
.reset, .bak, .deleted, .checkpoint
The current evidence suggests the issue is not only raw disk usage, but the session store loading path itself.
Problem 2: pi-trajectory-flush timeout
The agent cleanup step pi-trajectory-flush regularly times out at exactly 10000 ms.
Observed log examples:
agent cleanup timed out: runId=... sessionId=... step=pi-trajectory-flush timeoutMs=10000
Local code inspection suggests the timeout is currently hardcoded/defaulted around 10000 ms and is not externally configurable.
Local workarounds applied
The following local mitigations were applied successfully to improve disk usage and general stability:
- cleanup of stale
plugin-runtime-deps
- cleanup of
main.sqlite.tmp-*
- cleanup of
sessions.json.*.tmp
- archiving old and large
*.trajectory.jsonl
- limiting active trajectory files to 200
- archiving session artifacts:
*.jsonl.reset.*
*.jsonl.bak-*
*.trajectory.jsonl.deleted.*
*.checkpoint.*.jsonl
- local maintenance job for nightly cleanup
- planned local configuration workaround:
session.maintenance.maxEntries
session.maintenance.pruneDays
OPENCLAW_SESSION_CACHE_TTL_MS=120000
These workarounds reduce pressure and improve stability, but they do not address the root cause of the observed sessions.list latency.
Expected behavior
sessions.list should remain responsive with about 100 to 200 sessions.
- A few MB of
sessions.json should not lead to consistent 10 to 16 second latency.
- Session store cache misses should not cause long UI stalls.
pi-trajectory-flush timeout should be configurable or adaptive.
Suggested improvements
-
Improve session store loading and indexing
- avoid full parse/clone for each cache miss where possible
- consider indexed or incremental session metadata loading
- reduce synchronous filesystem work in the
sessions.list path
-
Improve session cache behavior
- smarter invalidation
- avoid unnecessary full deep clone via JSON serialization if possible
- allow better tuning for dashboard polling patterns
-
Make trajectory flush timeout configurable
- for example via environment variable such as
OPENCLAW_AGENT_CLEANUP_TIMEOUT_MS
- or allow a specific
OPENCLAW_TRAJECTORY_FLUSH_TIMEOUT_MS
-
Consider a lightweight sessions.list mode
- no heavy per-session processing
- explicit metadata depth flags
- dashboard-oriented fast path
Reproduction outline
- Run OpenClaw with around 100 to 200 sessions.
- Let
agents/main/sessions/sessions.json grow to a few MB.
- Call
sessions.list from the dashboard or API.
- Observe latency around 10 seconds, especially after cache expiry or cache invalidation.
- Run agent interactions and observe recurring
pi-trajectory-flush timeout warnings at 10000 ms.
Notes
This issue was identified during operational maintenance on a Raspberry Pi based OpenClaw installation. Filesystem cleanup and session artifact archiving reduced disk usage significantly and lowered general system pressure, but sessions.list remained slow. This suggests the remaining issue is in the session loading and cleanup implementation rather than only in accumulated local artifacts.
Summary
We are observing consistent performance issues in OpenClaw related to session handling.
There are two related symptoms:
sessions.listconsistently takes about 10 to 16 seconds under moderate session load.pi-trajectory-flushregularly times out at exactly 10000 ms.Local cleanup and pruning improve disk usage and general stability, but do not resolve the core latency.
Environment
openclaw:localagents/main/sessions/sessions.jsonObserved state before local cleanup and pruning:
sessions.json: about 4.1 MBAfter cleanup and limiting active trajectories,
sessions.liststill remained around 10 seconds.Problem 1:
sessions.listlatencyThe
sessions.listcommand consistently takes around 10 to 16 seconds.Observed log examples:
This continued even after:
plugin-runtime-deps.reset,.bak,.deleted,.checkpointThe current evidence suggests the issue is not only raw disk usage, but the session store loading path itself.
Problem 2:
pi-trajectory-flushtimeoutThe agent cleanup step
pi-trajectory-flushregularly times out at exactly 10000 ms.Observed log examples:
Local code inspection suggests the timeout is currently hardcoded/defaulted around 10000 ms and is not externally configurable.
Local workarounds applied
The following local mitigations were applied successfully to improve disk usage and general stability:
plugin-runtime-depsmain.sqlite.tmp-*sessions.json.*.tmp*.trajectory.jsonl*.jsonl.reset.**.jsonl.bak-**.trajectory.jsonl.deleted.**.checkpoint.*.jsonlsession.maintenance.maxEntriessession.maintenance.pruneDaysOPENCLAW_SESSION_CACHE_TTL_MS=120000These workarounds reduce pressure and improve stability, but they do not address the root cause of the observed
sessions.listlatency.Expected behavior
sessions.listshould remain responsive with about 100 to 200 sessions.sessions.jsonshould not lead to consistent 10 to 16 second latency.pi-trajectory-flushtimeout should be configurable or adaptive.Suggested improvements
Improve session store loading and indexing
sessions.listpathImprove session cache behavior
Make trajectory flush timeout configurable
OPENCLAW_AGENT_CLEANUP_TIMEOUT_MSOPENCLAW_TRAJECTORY_FLUSH_TIMEOUT_MSConsider a lightweight
sessions.listmodeReproduction outline
agents/main/sessions/sessions.jsongrow to a few MB.sessions.listfrom the dashboard or API.pi-trajectory-flushtimeout warnings at 10000 ms.Notes
This issue was identified during operational maintenance on a Raspberry Pi based OpenClaw installation. Filesystem cleanup and session artifact archiving reduced disk usage significantly and lowered general system pressure, but
sessions.listremained slow. This suggests the remaining issue is in the session loading and cleanup implementation rather than only in accumulated local artifacts.