Skip to content

fix(sessions): improve list query performance and minimal checkpoints…#76090

Merged
steipete merged 1 commit intoopenclaw:mainfrom
rolandrscheel:fix/sessions-list-performance
May 2, 2026
Merged

fix(sessions): improve list query performance and minimal checkpoints…#76090
steipete merged 1 commit intoopenclaw:mainfrom
rolandrscheel:fix/sessions-list-performance

Conversation

@rolandrscheel
Copy link
Copy Markdown
Contributor

Summary

  • Problem: sessions.list can become CPU- and memory-heavy on large session stores because it repeatedly deep-clones large cached session data, returns oversized compaction checkpoint summaries, repeatedly scans transcripts for usage fallback data, and performs repeated subagent child/link lookups per row.
  • Why it matters: The Control UI and local list-sessions paths poll sessions.list; large transcripts/checkpoints can make polling expensive enough to spike CPU or exhaust memory.
  • What changed: Added a sessions.list-specific session-store loader that can reuse a validated cached store object without full deep cloning, returns a minimal latest compaction checkpoint preview, caches transcript usage snapshots by transcript stat, and builds request-local subagent/store indexes for list row construction.
  • What did NOT change (scope boundary): No session mutation behavior, transcript format, compaction behavior, model routing, or external API behavior was changed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: sessions.list used general-purpose session-store and subagent lookup paths optimized for correctness and defensive isolation, but not for repeated high-volume list rendering. Large checkpoint summaries and transcript usage fallback scans amplified the cost.
  • Missing detection / guardrail: No perf/regression coverage around large session stores with compaction checkpoints, many subagent links, and transcript usage fallback reads.
  • Contributing context (if known): Control UI polling repeatedly exercises this path, so even moderate per-call overhead becomes noticeable.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/gateway/session-utils.subagent.test.ts
    • src/gateway/session-utils.fs.test.ts
    • sessions.list handler tests under src/gateway/server-methods/
  • Scenario the test should lock in:
    • sessions.list builds subagent child relationships using request-local indexes without changing stale/live child-link semantics.
    • transcript usage fallback returns cached snapshots when transcript size/mtime has not changed.
    • latest compaction checkpoint in list rows omits the heavyweight summary payload.
  • Why this is the smallest reliable guardrail: These are the narrow gateway paths that caused the observed CPU/memory pressure.
  • Existing test that already covers this (if any): Existing subagent metadata tests cover most child-link semantics; targeted tests were run against the changed files.
  • If no new test is added, why not: Existing coverage already exercises the critical behavior; this PR primarily changes implementation/performance characteristics.

User-visible / Behavior Changes

sessions.list responses now include only a minimal latestCompactionCheckpoint preview (checkpointId, createdAt, reason) instead of returning the full checkpoint object with large summary data.

Diagram (if applicable)

Before:
Control UI poll -> sessions.list -> deep clone store + per-row subagent scans + transcript scans + full checkpoint summary

After:
Control UI poll -> sessions.list -> validated cache reuse + request-local indexes + stat-keyed transcript usage cache + checkpoint preview

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: OpenClaw source checkout based on release/2026.4.29
  • Model/provider: N/A
  • Integration/channel (if any): Control UI / gateway sessions.list path
  • Relevant config (redacted): N/A

Steps

  1. Create or use a session store with many sessions/subagent relationships and compaction checkpoints.
  2. Trigger sessions.list via gateway/Control UI polling.
  3. Observe CPU/memory pressure before the fix; compare with the request-local cache/index behavior after the fix.

Expected

  • sessions.list remains responsive and avoids repeated heavyweight cloning/scanning work.
  • Existing subagent child-link semantics are preserved.
  • Large checkpoint summary payloads are not returned in list rows.

Actual

  • Before: repeated list calls can cause high CPU/memory pressure.
  • After: sessions.list uses list-specific fast paths and preserves tested behavior.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Targeted verification run:

OPENCLAW_VITEST_MAX_WORKERS=1 corepack pnpm exec vitest run \
  --config test/vitest/vitest.gateway.config.ts \
  src/gateway/session-utils.subagent.test.ts \
  src/gateway/session-utils.fs.test.ts \
  src/gateway/server-methods/sessions.send-followup-status.test.ts \
  src/gateway/server-methods/sessions.send-deleted-agent.test.ts

Test Files  4 passed (4)
Tests       80 passed (80)

Also verified:

git diff --check origin/release/2026.4.29...HEAD

No whitespace errors.

corepack pnpm tsgo:core was attempted locally but this checkout is missing UI dependencies (@noble/ed25519, dompurify, @vitest/browser-playwright), so it fails before providing a clean repo-wide type signal. The failures are unrelated to the changed session/gateway files.

Human Verification (required)

  • Verified scenarios:
    • Existing subagent metadata tests pass with the request-local index path.
    • Transcript usage tests pass with stat-keyed caching.
    • sessions.list-related gateway handler smoke tests pass.
  • Edge cases checked:
    • stale active subagent snapshots
    • ended parent with live descendant
    • moved child sessions
    • store-only child links
    • transcript usage fallback reads
  • What you did not verify:
    • Full CI matrix
    • Real-world perf numbers from a production-sized session store

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes, except latestCompactionCheckpoint list payload is intentionally narrowed to preview fields)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: The sessions.list-specific cache path returns the cached store object without cloning, so accidental mutation by list code would affect the cached object.
    • Mitigation: The sessions.list row construction path is read-only; tests exercise the relevant gateway behavior. Mutation paths continue to use the existing general loader.
  • Risk: Narrowing latestCompactionCheckpoint could affect callers that depended on full checkpoint summaries from list rows.
    • Mitigation: The list endpoint should only expose a lightweight preview; full checkpoint data remains in the store/compaction paths.
  • Risk: Request-local subagent indexes could diverge from existing lookup semantics.
    • Mitigation: Existing subagent metadata tests pass, including stale/moved child edge cases.

AI/Vibe-Coded PR Transparency

  • AI-assisted
  • Testing degree: targeted gateway tests passed; full local typecheck blocked by missing UI dependencies in this checkout.
  • I understand what the code does: yes — this PR converts installed monkey-patch performance fixes into source-level changes and centralizes subagent read indexes for sessions.list.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: M labels May 2, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

Codex review: needs maintainer review before merge.

Summary
The PR adds request-scoped subagent read indexes for session row construction, narrows latestCompactionCheckpoint list rows to a preview shape, updates UI types and compaction coverage, and records the user-facing fix in the changelog.

Reproducibility: unclear. The PR body gives a plausible large-store reproduction path with compaction checkpoints, subagent links, transcript fallback reads, and Control UI polling, but there is no checked-in perf fixture or trace for independent reproduction.

Next step before merge
No automated repair lane is needed; the prior changelog blocker is fixed, and remaining work is normal exact-head CI completion plus maintainer merge review.

Security
Cleared: Cleared: the diff stays within session listing, subagent read-index helpers, UI types, tests, and changelog text, with no new dependencies, scripts, permissions, network calls, artifact execution, or secret handling.

Review details

Best possible solution:

Land the narrowed PR after exact-head checks complete, preserving async sessions.list yielding, read-only list-row fast paths, and the lightweight checkpoint preview contract.

Do we have a high-confidence way to reproduce the issue?

Unclear. The PR body gives a plausible large-store reproduction path with compaction checkpoints, subagent links, transcript fallback reads, and Control UI polling, but there is no checked-in perf fixture or trace for independent reproduction.

Is this the best way to solve the issue?

Yes for the diff as reviewed: the changes are scoped to Gateway/session listing and UI types, keep mutation paths on existing store loaders, and add a focused compaction assertion plus changelog coverage. The stale broader PR-body claims should be treated as scope notes rather than implemented behavior.

What I checked:

  • Current main sessions.list path: Current main still handles sessions.list by loading the combined session store and calling listSessionsFromStoreAsync; the PR’s request-scoped row context is not present on main. (src/gateway/server-methods/sessions.ts:660, dda2db97d43b)
  • Current main checkpoint payload: Current main still types GatewaySessionRow.latestCompactionCheckpoint as the full SessionCompactionCheckpoint, which includes heavyweight fields that the PR narrows for list responses. (src/gateway/session-utils.types.ts:87, dda2db97d43b)
  • Current main subagent lookup cost: Current main resolves runtime and store child-session links by repeatedly calling subagent lookup/count helpers while building list rows; the PR replaces this path with a request-local read index. (src/gateway/session-utils.ts:344, dda2db97d43b)
  • PR diff adds read index and preview contract: The latest PR patch adds buildSubagentRunReadIndexFromRuns, buildSubagentRunReadIndex, buildCompactionCheckpointPreview, row-context plumbing, and preview types in Gateway/UI surfaces. (src/agents/subagent-registry-queries.ts:58, 773b6093452d)
  • Changelog blocker addressed: The latest PR patch adds a Gateway/sessions changelog entry, resolving the earlier ClawSweeper P3 finding for this user-facing sessions.list performance/API change. (CHANGELOG.md:15, 773b6093452d)
  • Exact-head check status: GitHub check-runs for head 773b6093452db426e8fcbbc1331921471439022c showed the main check/build/lint/type/security-fast jobs successful, with Security High (actions) still in progress at review time. (773b6093452d)

Likely related people:

  • steipete: Recent current-main work in the same sessions hot path covers exact session lookup speedups, async transcript/history performance, sync-reader removal, and session-store writer/cache routing; the PR head commit is also authored by this maintainer. (role: recent maintainer; confidence: high; commits: 0ea28ddb165d, 4d9c658f4058, ee8371d31317; files: src/gateway/session-utils.ts, src/gateway/server-methods/sessions.ts, src/config/sessions/store-cache.ts)
  • vincentkoc: Recent merged history includes sessions.list transcript-usage bounding, child-link indexing, preview hydration caps, and session-store clone memory reduction adjacent to this PR’s performance target. (role: adjacent performance owner; confidence: high; commits: ecf6cbf75d3d, 37f8c3806ac9, 694598822f19; files: src/gateway/session-utils.ts, src/config/sessions/store-cache.ts)
  • Takhoffman: Merged subagent history includes the moved-child, restarted-descendant, and active-child count fixes whose semantics the new request-local read index must preserve. (role: subagent semantics maintainer; confidence: medium; commits: e48a0b80a81b, c541cde0f66e, e24704d5eb8a; files: src/agents/subagent-registry-queries.ts, src/gateway/session-utils.ts)

Remaining risk / open question:

  • No independent production-sized perf fixture or trace was available in the PR context; the performance judgment relies on code inspection, existing focused tests, and exact-head CI.
  • The PR body still describes broader store-loader and transcript-usage caching work than the final patch visibly adds, so maintainers should confirm the narrower final scope is intentional before merge.
  • The exact-head Security High (actions) check was still in progress at review time.

Codex review notes: model gpt-5.5, reasoning high; reviewed against dda2db97d43b.

@rolandrscheel rolandrscheel force-pushed the fix/sessions-list-performance branch from 99d7b19 to 0853673 Compare May 2, 2026 13:48
@rolandrscheel rolandrscheel force-pushed the fix/sessions-list-performance branch from 0853673 to 9988a2c Compare May 2, 2026 13:55
@steipete steipete force-pushed the fix/sessions-list-performance branch from 9988a2c to 46775a0 Compare May 2, 2026 14:13
@steipete steipete requested review from a team as code owners May 2, 2026 14:13
@steipete steipete changed the base branch from release/2026.4.29 to main May 2, 2026 14:13
@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui size: M and removed size: L labels May 2, 2026
@rolandrscheel rolandrscheel force-pushed the fix/sessions-list-performance branch from 46775a0 to 6c68941 Compare May 2, 2026 14:16
@openclaw-barnacle openclaw-barnacle Bot added size: L and removed app: web-ui App: web-ui size: M labels May 2, 2026
@rolandrscheel rolandrscheel force-pushed the fix/sessions-list-performance branch from 6c68941 to d2cc138 Compare May 2, 2026 14:19
@steipete steipete force-pushed the fix/sessions-list-performance branch from d2cc138 to 96f7a87 Compare May 2, 2026 14:21
@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui size: M and removed size: L labels May 2, 2026
@steipete steipete force-pushed the fix/sessions-list-performance branch from 96f7a87 to 222ee74 Compare May 2, 2026 14:22
@rolandrscheel rolandrscheel force-pushed the fix/sessions-list-performance branch from f81f989 to 33a65ee Compare May 2, 2026 14:26
@steipete steipete force-pushed the fix/sessions-list-performance branch 2 times, most recently from 8c47981 to 773b609 Compare May 2, 2026 14:31
Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>
@steipete steipete force-pushed the fix/sessions-list-performance branch from 773b609 to ebafda4 Compare May 2, 2026 14:42
@steipete steipete merged commit 2b37b38 into openclaw:main May 2, 2026
86 checks passed
@steipete
Copy link
Copy Markdown
Contributor

steipete commented May 2, 2026

Landed via rebase onto main.

  • Local gate: pnpm test src/gateway/session-utils.subagent.test.ts src/gateway/session-utils.fs.test.ts src/gateway/server.sessions.compaction.test.ts src/agents/subagent-registry-queries.test.ts (113 tests)
  • GitHub exact-head checks: passed for ebafda418624c2141615f86297d40c2e0dc5ece7
  • Source commit: ebafda4
  • Land commit: 2b37b38

Thanks @rolandrscheel!

lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Co-authored-by: rolandrscheel <20336324+rolandrscheel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui gateway Gateway runtime size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants