Skip to content

[Bug]: sessions.list is extremely slow (4+ seconds) causing event loop saturation #77373

@najef1979-code

Description

@najef1979-code

Bug type

Regression (worked before, now fails)

Beta release blocker

Yes

Summary

The sessions.list WebSocket operation is taking 2-5 seconds to complete, causing the event loop to become saturated. This appears to be a performance regression.

Version: 2026.5.3-1 (2eae30e)

Environment:

  • Linux 6.17.0-23-generic, Node v24.14.1 via NVM
  • Gateway running via systemd user service
  • 17 agents configured, 104 sessions in neon session store
  • System: 32GB RAM, NVMe disk

Steps to reproduce

  1. Set up environment:

    • Install OpenClaw gateway (any recent version)
    • Configure 17+ agents with active session stores
    • Have 100+ sessions in session.json (neon agent has 104)
  2. Connect control UI:

    • Connect the OpenClaw control UI dashboard to the gateway
    • Multiple clients polling simultaneously increases load
  3. Observe symptoms:

    • Run openclaw logs --follow to see sessions.list taking 2-5 seconds
    • Run openclaw health to see "Gateway event loop: degraded"
    • Run top to see gateway CPU at 500-800%
    • Run openclaw gateway stability --json to check event loop delays

Expected behavior

Expected vs Actual:

  • Expected: sessions.list completes in <100ms
  • Actual: sessions.list takes 1800-4300ms
  • Event loop utilization jumps from ~0.1 to 0.9-1.0

Actual behavior

Expected vs Actual:

  • Expected: sessions.list completes in <100ms
  • Actual: sessions.list takes 1800-4300ms
  • Event loop utilization jumps from ~0.1 to 0.9-1.0

OpenClaw version

2026.5.3-1 (2eae30e)

Operating system

Linux 6.17.0-23-generic (ubuntu)

Install method

Node v24.14.1 via NVM

Model

Minimax/Minimax.m2.7

Provider / routing chain

Minimax/Minimax.m2.7

Additional provider/model setup details

No response

Logs, screenshots, and evidence

**Log Evidence:**

### sessions.list timing (showing slow responses)

2026-05-04T12:50:53.609Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 3244ms conn=99b02c46…bf85 id=3c271020…172f
2026-05-04T12:50:57.864Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 4234ms conn=99b02c46…bf85 id=1c9b2f72…5bb2
2026-05-04T12:51:12.478Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2308ms conn=99b02c46…bf85 id=37a4e542…f959
2026-05-04T12:51:24.276Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1852ms conn=99b02c46…bf85 id=37cd3730…1cf7
2026-05-04T12:51:32.188Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2946ms conn=99b02c46…bf85 id=57c061d0…fd2a
2026-05-04T12:51:48.132Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2932ms conn=99b02c46…bf85 id=c8c0468a…e9ec
2026-05-04T12:51:50.788Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2440ms conn=99b02c46…bf85 id=78b75b25…c0b6
2026-05-04T12:55:15.993Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2797ms conn=99b02c46…bf85 id=fbd3e198…08c0
2026-05-04T12:55:20.320Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 4292ms conn=99b02c46…bf85 id=7a7e4994…bf5a
2026-05-04T12:55:26.355Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2261ms conn=99b02c46…bf85 id=24e823e…87a5
2026-05-04T12:55:31.584Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2566ms conn=99b02c46…bf85 id=c0b6ca40…fc15
2026-05-04T12:55:36.923Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 3791ms conn=99b02c46…bf85 id=bbb582bd…d34c
2026-05-04T12:55:44.919Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 3295ms conn=99b02c46…bf85 id=1be7033d…b0e7
2026-05-04T12:55:49.070Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2823ms conn=99b02c46…bf85 id=505cbc23…a684
2026-05-04T12:55:52.556Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1830ms conn=99b02c46…bf85 id=b1c53436…9963
2026-05-04T12:55:57.160Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 2638ms conn=99b02c46…bf85 id=fa765188…ce9b
2026-05-04T12:56:01.930Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 3291ms conn=99b02c46…bf85 id=46ddcc14…ecdf
2026-05-04T12:56:04.907Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1855ms conn=99b02c46…bf85 id=7f91f31b…e8a6
2026-05-04T12:56:09.190Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 3053ms conn=99b02c46…bf85 id=e7966a96…4fa0
2026-05-04T12:56:13.660Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 3410ms conn=99b02c46…bf85 id=055694c3…a1da
2026-05-04T12:56:18.158Z info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 3321ms conn=99b02c46…bf85 id=4a3fdffe…8b31


### Event loop degradation (correlating with sessions.list slowness)

2026-05-04T12:47:44.984Z warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay,cpu interval=31s eventLoopDelayP99Ms=1743.8 eventLoopDelayMaxMs=3116.4 eventLoopUtilization=0.75 cpuCoreRatio=4.151 active=1 waiting=0 queued=0
2026-05-04T12:50:16.158Z warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=31s eventLoopDelayP99Ms=5272.2 eventLoopDelayMaxMs=5289 eventLoopUtilization=0.962 cpuCoreRatio=3.847 active=1 waiting=0 queued=0
2026-05-04T12:50:46.752Z warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay,cpu interval=31s eventLoopDelayP99Ms=3294.6 eventLoopDelayMaxMs=4680.8 eventLoopUtilization=0.923 cpuCoreRatio=4.475 active=1 waiting=0 queued=0
2026-05-04T12:55:20.323Z warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay,cpu interval=32s eventLoopDelayP99Ms=1486.9 eventLoopDelayMaxMs=4299.2 eventLoopUtilization=0.68 cpuCoreRatio=7.171 active=1 waiting=0 queued=0


### Agent cleanup timeouts (consequence of event loop saturation)

2026-05-04T12:50:57.867Z warn agent/embedded {"subsystem":"agent/embedded"} agent cleanup timed out: runId=dd0fa787-bcb0-411d-94fa-32285ac646d2 sessionId=8d026f54-297c-44b3-a4e8-b64f55a473cf step=pi-trajectory-flush timeoutMs=10000


### System load (from top)

load average: 11.41, 11.77, 10.70
%Cpu(s): 98.9 us, 1.1 sy

PID USER      COMMAND                     %CPU
346084 najef   openclaw/dist/index.js      754.5

Impact and severity

Impact:

  • Gateway event loop degraded (eventLoopUtilization reaching 0.962-1.0)
  • High CPU load (load average consistently 10-11, CPU at 99%)
  • Slow responsiveness across all WebSocket operations
  • Agent cleanup timeouts (pi-trajectory-flush timing out after 10s)
  • Gateway CPU usage spikes to 754% (multi-core utilization)

Additional information

Question:
Is there something that can be optimized in the sessions.list operation? Could session listing be cached or made incremental rather than loading all sessions on every poll?

Workaround: None found - the constant polling from control UI creates a backlog that can't be cleared while the operation itself is slow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions