Skip to content

Control UI still repeatedly calls slow sessions.list on 2026.4.29, causing Gateway latency/GC pressure #76166

@richardmqq

Description

@richardmqq

Summary

On OpenClaw 2026.4.29 (a448042), the local Control UI/webchat connection still repeatedly called sessions.list, and each call took about 10–11s on a large session store. This caused Gateway event-loop/GC pressure and user-visible RPC timeouts.

This looks like a regression or release-build mismatch with the earlier fix mentioned in #59317, which says Control UI should subscribe to sessions.changed instead of polling.

Environment

  • OpenClaw: 2026.4.29 (a448042)
  • Host: macOS / Darwin arm64, Node 25.9.0
  • Gateway: local loopback 127.0.0.1:18789
  • Session directories are large:
    • ~/.openclaw/agents/main/sessions: ~395 MB
    • ~/.openclaw/agents/system-architect/sessions: ~383 MB
    • ~1012 session/jsonl-like files under ~/.openclaw

Observed behavior

A Chrome tab connected to local Gateway as:

client=openclaw-control-ui webchat vcontrol-ui
conn=1a598069-706d-475b-837a-41e3e2a68c21

From 2026-05-02T22:06+08:00 onward, Gateway logs showed repeated sessions.list calls from this connection. Examples:

2026-05-02T22:06:42.868+08:00 [ws] ⇄ res ✓ sessions.list 19829ms conn=1a598069…8c21
2026-05-02T22:07:25.232+08:00 [ws] ⇄ res ✓ sessions.list 11286ms conn=1a598069…8c21
2026-05-02T23:13:55.129+08:00 [ws] ⇄ res ✓ sessions.list 10651ms conn=1a598069…8c21
2026-05-02T23:18:08.344+08:00 [ws] ⇄ res ✓ sessions.list 10712ms conn=1a598069…8c21

Between 23:00 and 23:18, there were 51 sessions.list calls and 6 node.list calls from the same UI connection. sessions.list was consistently around 10–11s.

During the same window, Gateway showed liveness/timeout symptoms:

[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu ... eventLoopDelayMaxMs=10888.4 eventLoopUtilization=1 cpuCoreRatio=1.005
[fetch-timeout] fetch timeout reached; aborting operation
[agent/embedded] agent cleanup timed out ... step=pi-trajectory-flush timeoutMs=10000
[tools] sessions_send failed: gateway timeout after 10000ms

sample of the Gateway process showed significant time in V8 GC / heap marking. Gateway RSS/footprint reached roughly 1.0–1.7 GB during the incident.

Mitigation result

Closing the single Chrome tab for 127.0.0.1:18789 / localhost:18789 disconnected the UI TCP client:

2026-05-02T23:21:15.944+08:00 [ws] webchat disconnected code=1001 reason=n/a conn=1a598069-706d-475b-837a-41e3e2a68c21

After that:

  • no new sessions.list calls from that connection
  • no new liveness warning lines in the immediate post-close check
  • Gateway CPU dropped from high/near-100% bursts to low single digits / sub-1% most of the time
  • non-deep commands improved:
    • openclaw --version: ~104ms
    • openclaw config validate: ~1.6s
    • openclaw tasks flow show ... --json: ~1.2s

Expected behavior

Control UI should not continuously poll expensive sessions.list on large session stores. If #59317's subscribe/push behavior is intended, the release build should not keep issuing repeated sessions.list calls every ~10–30s while idle.

Additionally, sessions.list should probably avoid expensive transcript/session-derived computation unless explicitly requested, and/or cache rows so one UI tab cannot create sustained 10s RPC work.

Related issues

This report is for current released 2026.4.29, where the behavior is still reproducible in a real large-session-store environment.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions