fix(sessions): improve list query performance and minimal checkpoints… by rolandrscheel · Pull Request #76090 · openclaw/openclaw

rolandrscheel · 2026-05-02T13:40:29Z

Summary

Problem: sessions.list can become CPU- and memory-heavy on large session stores because it repeatedly deep-clones large cached session data, returns oversized compaction checkpoint summaries, repeatedly scans transcripts for usage fallback data, and performs repeated subagent child/link lookups per row.
Why it matters: The Control UI and local list-sessions paths poll sessions.list; large transcripts/checkpoints can make polling expensive enough to spike CPU or exhaust memory.
What changed: Added a sessions.list-specific session-store loader that can reuse a validated cached store object without full deep cloning, returns a minimal latest compaction checkpoint preview, caches transcript usage snapshots by transcript stat, and builds request-local subagent/store indexes for list row construction.
What did NOT change (scope boundary): No session mutation behavior, transcript format, compaction behavior, model routing, or external API behavior was changed.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: sessions.list used general-purpose session-store and subagent lookup paths optimized for correctness and defensive isolation, but not for repeated high-volume list rendering. Large checkpoint summaries and transcript usage fallback scans amplified the cost.
Missing detection / guardrail: No perf/regression coverage around large session stores with compaction checkpoints, many subagent links, and transcript usage fallback reads.
Contributing context (if known): Control UI polling repeatedly exercises this path, so even moderate per-call overhead becomes noticeable.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- src/gateway/session-utils.subagent.test.ts
- src/gateway/session-utils.fs.test.ts
- sessions.list handler tests under src/gateway/server-methods/
Scenario the test should lock in:
- sessions.list builds subagent child relationships using request-local indexes without changing stale/live child-link semantics.
- transcript usage fallback returns cached snapshots when transcript size/mtime has not changed.
- latest compaction checkpoint in list rows omits the heavyweight summary payload.
Why this is the smallest reliable guardrail: These are the narrow gateway paths that caused the observed CPU/memory pressure.
Existing test that already covers this (if any): Existing subagent metadata tests cover most child-link semantics; targeted tests were run against the changed files.
If no new test is added, why not: Existing coverage already exercises the critical behavior; this PR primarily changes implementation/performance characteristics.

User-visible / Behavior Changes

sessions.list responses now include only a minimal latestCompactionCheckpoint preview (checkpointId, createdAt, reason) instead of returning the full checkpoint object with large summary data.

Diagram (if applicable)

Before:
Control UI poll -> sessions.list -> deep clone store + per-row subagent scans + transcript scans + full checkpoint summary

After:
Control UI poll -> sessions.list -> validated cache reuse + request-local indexes + stat-keyed transcript usage cache + checkpoint preview

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux
Runtime/container: OpenClaw source checkout based on release/2026.4.29
Model/provider: N/A
Integration/channel (if any): Control UI / gateway sessions.list path
Relevant config (redacted): N/A

Steps

Create or use a session store with many sessions/subagent relationships and compaction checkpoints.
Trigger sessions.list via gateway/Control UI polling.
Observe CPU/memory pressure before the fix; compare with the request-local cache/index behavior after the fix.

Expected

sessions.list remains responsive and avoids repeated heavyweight cloning/scanning work.
Existing subagent child-link semantics are preserved.
Large checkpoint summary payloads are not returned in list rows.

Actual

Before: repeated list calls can cause high CPU/memory pressure.
After: sessions.list uses list-specific fast paths and preserves tested behavior.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Targeted verification run:

OPENCLAW_VITEST_MAX_WORKERS=1 corepack pnpm exec vitest run \
  --config test/vitest/vitest.gateway.config.ts \
  src/gateway/session-utils.subagent.test.ts \
  src/gateway/session-utils.fs.test.ts \
  src/gateway/server-methods/sessions.send-followup-status.test.ts \
  src/gateway/server-methods/sessions.send-deleted-agent.test.ts

Test Files  4 passed (4)
Tests       80 passed (80)

Also verified:

git diff --check origin/release/2026.4.29...HEAD

No whitespace errors.

corepack pnpm tsgo:core was attempted locally but this checkout is missing UI dependencies (@noble/ed25519, dompurify, @vitest/browser-playwright), so it fails before providing a clean repo-wide type signal. The failures are unrelated to the changed session/gateway files.

Human Verification (required)

Verified scenarios:
- Existing subagent metadata tests pass with the request-local index path.
- Transcript usage tests pass with stat-keyed caching.
- sessions.list-related gateway handler smoke tests pass.
Edge cases checked:
- stale active subagent snapshots
- ended parent with live descendant
- moved child sessions
- store-only child links
- transcript usage fallback reads
What you did not verify:
- Full CI matrix
- Real-world perf numbers from a production-sized session store

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes, except latestCompactionCheckpoint list payload is intentionally narrowed to preview fields)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: The sessions.list-specific cache path returns the cached store object without cloning, so accidental mutation by list code would affect the cached object.
- Mitigation: The sessions.list row construction path is read-only; tests exercise the relevant gateway behavior. Mutation paths continue to use the existing general loader.
Risk: Narrowing latestCompactionCheckpoint could affect callers that depended on full checkpoint summaries from list rows.
- Mitigation: The list endpoint should only expose a lightweight preview; full checkpoint data remains in the store/compaction paths.
Risk: Request-local subagent indexes could diverge from existing lookup semantics.
- Mitigation: Existing subagent metadata tests pass, including stale/moved child edge cases.

AI/Vibe-Coded PR Transparency

AI-assisted
Testing degree: targeted gateway tests passed; full local typecheck blocked by missing UI dependencies in this checkout.
I understand what the code does: yes — this PR converts installed monkey-patch performance fixes into source-level changes and centralizes subagent read indexes for sessions.list.

clawsweeper · 2026-05-02T13:41:55Z

Codex review: needs maintainer review before merge.

Summary
The PR adds request-scoped subagent read indexes for session row construction, narrows latestCompactionCheckpoint list rows to a preview shape, updates UI types and compaction coverage, and records the user-facing fix in the changelog.

Reproducibility: unclear. The PR body gives a plausible large-store reproduction path with compaction checkpoints, subagent links, transcript fallback reads, and Control UI polling, but there is no checked-in perf fixture or trace for independent reproduction.

Next step before merge
No automated repair lane is needed; the prior changelog blocker is fixed, and remaining work is normal exact-head CI completion plus maintainer merge review.

Security
Cleared: Cleared: the diff stays within session listing, subagent read-index helpers, UI types, tests, and changelog text, with no new dependencies, scripts, permissions, network calls, artifact execution, or secret handling.

Review details

Best possible solution:

Land the narrowed PR after exact-head checks complete, preserving async sessions.list yielding, read-only list-row fast paths, and the lightweight checkpoint preview contract.

Do we have a high-confidence way to reproduce the issue?

Unclear. The PR body gives a plausible large-store reproduction path with compaction checkpoints, subagent links, transcript fallback reads, and Control UI polling, but there is no checked-in perf fixture or trace for independent reproduction.

Is this the best way to solve the issue?

Yes for the diff as reviewed: the changes are scoped to Gateway/session listing and UI types, keep mutation paths on existing store loaders, and add a focused compaction assertion plus changelog coverage. The stale broader PR-body claims should be treated as scope notes rather than implemented behavior.

What I checked:

Current main sessions.list path: Current main still handles sessions.list by loading the combined session store and calling listSessionsFromStoreAsync; the PR’s request-scoped row context is not present on main. (src/gateway/server-methods/sessions.ts:660, dda2db97d43b)
Current main checkpoint payload: Current main still types GatewaySessionRow.latestCompactionCheckpoint as the full SessionCompactionCheckpoint, which includes heavyweight fields that the PR narrows for list responses. (src/gateway/session-utils.types.ts:87, dda2db97d43b)
Current main subagent lookup cost: Current main resolves runtime and store child-session links by repeatedly calling subagent lookup/count helpers while building list rows; the PR replaces this path with a request-local read index. (src/gateway/session-utils.ts:344, dda2db97d43b)
PR diff adds read index and preview contract: The latest PR patch adds buildSubagentRunReadIndexFromRuns, buildSubagentRunReadIndex, buildCompactionCheckpointPreview, row-context plumbing, and preview types in Gateway/UI surfaces. (src/agents/subagent-registry-queries.ts:58, 773b6093452d)
Changelog blocker addressed: The latest PR patch adds a Gateway/sessions changelog entry, resolving the earlier ClawSweeper P3 finding for this user-facing sessions.list performance/API change. (CHANGELOG.md:15, 773b6093452d)
Exact-head check status: GitHub check-runs for head 773b6093452db426e8fcbbc1331921471439022c showed the main check/build/lint/type/security-fast jobs successful, with Security High (actions) still in progress at review time. (773b6093452d)

Likely related people:

steipete: Recent current-main work in the same sessions hot path covers exact session lookup speedups, async transcript/history performance, sync-reader removal, and session-store writer/cache routing; the PR head commit is also authored by this maintainer. (role: recent maintainer; confidence: high; commits: 0ea28ddb165d, 4d9c658f4058, ee8371d31317; files: src/gateway/session-utils.ts, src/gateway/server-methods/sessions.ts, src/config/sessions/store-cache.ts)
vincentkoc: Recent merged history includes sessions.list transcript-usage bounding, child-link indexing, preview hydration caps, and session-store clone memory reduction adjacent to this PR’s performance target. (role: adjacent performance owner; confidence: high; commits: ecf6cbf75d3d, 37f8c3806ac9, 694598822f19; files: src/gateway/session-utils.ts, src/config/sessions/store-cache.ts)
Takhoffman: Merged subagent history includes the moved-child, restarted-descendant, and active-child count fixes whose semantics the new request-local read index must preserve. (role: subagent semantics maintainer; confidence: medium; commits: e48a0b80a81b, c541cde0f66e, e24704d5eb8a; files: src/agents/subagent-registry-queries.ts, src/gateway/session-utils.ts)

Remaining risk / open question:

No independent production-sized perf fixture or trace was available in the PR context; the performance judgment relies on code inspection, existing focused tests, and exact-head CI.
The PR body still describes broader store-loader and transcript-usage caching work than the final patch visibly adds, so maintainers should confirm the narrower final scope is intentional before merge.
The exact-head Security High (actions) check was still in progress at review time.

Codex review notes: model gpt-5.5, reasoning high; reviewed against dda2db97d43b.

Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>

steipete · 2026-05-02T14:46:15Z

Landed via rebase onto main.

Local gate: pnpm test src/gateway/session-utils.subagent.test.ts src/gateway/session-utils.fs.test.ts src/gateway/server.sessions.compaction.test.ts src/agents/subagent-registry-queries.test.ts (113 tests)
GitHub exact-head checks: passed for ebafda418624c2141615f86297d40c2e0dc5ece7
Source commit: ebafda4
Land commit: 2b37b38

Thanks @rolandrscheel!

Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>

openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: M labels May 2, 2026

rolandrscheel force-pushed the fix/sessions-list-performance branch from 99d7b19 to 0853673 Compare May 2, 2026 13:48

openclaw-barnacle Bot added size: L and removed size: M labels May 2, 2026

rolandrscheel force-pushed the fix/sessions-list-performance branch from 0853673 to 9988a2c Compare May 2, 2026 13:55

steipete force-pushed the fix/sessions-list-performance branch from 9988a2c to 46775a0 Compare May 2, 2026 14:13

steipete requested review from a team as code owners May 2, 2026 14:13

steipete changed the base branch from release/2026.4.29 to main May 2, 2026 14:13

openclaw-barnacle Bot added app: web-ui App: web-ui size: M and removed size: L labels May 2, 2026

rolandrscheel force-pushed the fix/sessions-list-performance branch from 46775a0 to 6c68941 Compare May 2, 2026 14:16

openclaw-barnacle Bot added size: L and removed app: web-ui App: web-ui size: M labels May 2, 2026

rolandrscheel force-pushed the fix/sessions-list-performance branch from 6c68941 to d2cc138 Compare May 2, 2026 14:19

steipete force-pushed the fix/sessions-list-performance branch from d2cc138 to 96f7a87 Compare May 2, 2026 14:21

openclaw-barnacle Bot added app: web-ui App: web-ui size: M and removed size: L labels May 2, 2026

steipete force-pushed the fix/sessions-list-performance branch from 96f7a87 to 222ee74 Compare May 2, 2026 14:22

rolandrscheel force-pushed the fix/sessions-list-performance branch from f81f989 to 33a65ee Compare May 2, 2026 14:26

steipete force-pushed the fix/sessions-list-performance branch 2 times, most recently from 8c47981 to 773b609 Compare May 2, 2026 14:31

fix(sessions): keep list polling lightweight (openclaw#76090)

ebafda4

Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>

steipete force-pushed the fix/sessions-list-performance branch from 773b609 to ebafda4 Compare May 2, 2026 14:42

steipete merged commit 2b37b38 into openclaw:main May 2, 2026
86 checks passed

steipete mentioned this pull request May 2, 2026

fix(sessions): persist normalization and maintenance results to disk + fix(plugins): defer lazy runtime-dep installs #75831

Closed

github-actions Bot mentioned this pull request May 2, 2026

📡 Upstream Digest — 2026-05-02 16:37 UTC curtismercier/openclaw-mods#747

Open

BunsDev mentioned this pull request May 2, 2026

Web UI blank screen after login on 2026.4.30-beta.1 #76170

Closed

lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026

fix(sessions): keep list polling lightweight (openclaw#76090)

003b9ac

Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026

fix(sessions): keep list polling lightweight (openclaw#76090)

5def6b3

Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sessions): improve list query performance and minimal checkpoints…#76090

fix(sessions): improve list query performance and minimal checkpoints…#76090
steipete merged 1 commit intoopenclaw:mainfrom
rolandrscheel:fix/sessions-list-performance

rolandrscheel commented May 2, 2026

Uh oh!

clawsweeper Bot commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

steipete commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rolandrscheel commented May 2, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

AI/Vibe-Coded PR Transparency

Uh oh!

clawsweeper Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

steipete commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 2, 2026 •

edited

Loading