Skip to content

fix(memory): yield during session indexing#76978

Merged
steipete merged 3 commits into
openclaw:mainfrom
bitloi:fix/issue-76890-memory-session-yield
May 10, 2026
Merged

fix(memory): yield during session indexing#76978
steipete merged 3 commits into
openclaw:mainfrom
bitloi:fix/issue-76890-memory-session-yield

Conversation

@bitloi

@bitloi bitloi commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: memory-core session transcript indexing can run during startup/session-start and starve the gateway event loop for large session corpora.
  • Why it matters: gateway WebSocket requests can time out while the process is alive but busy indexing sessions.
  • What changed: session transcript sync now yields to the event loop between indexing batches and stale-row pruning batches.
  • What did NOT change (scope boundary): this does not change memory provider selection, session source semantics, config behavior, or move indexing to a worker thread.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count.

  • Behavior or issue addressed: memory-core session sync yields back to the Node event loop during large session transcript indexing batches instead of monopolizing startup/session-start work until the full corpus finishes.
  • Real environment tested: local OpenClaw checkout at PR head, Linux container, Node v24.13.0, memory-core session sync path.
  • Exact steps or command run after this patch:
    1. Rebased the branch onto current upstream/main so the PR includes the current test-type fixes.
    2. Ran the focused memory-core session-yield verification command against the real PR checkout.
    3. Ran the failing check:test-types shard locally.
  • Evidence after fix: copied terminal output from the local PR checkout:
$ PATH=/root/.nvm/versions/node/v24.13.0/bin:$PATH corepack pnpm test extensions/memory-core/src/memory/manager-sync-yield.test.ts

> openclaw@2026.5.5 test /root/74/openclaw
> node scripts/test-projects.mjs extensions/memory-core/src/memory/manager-sync-yield.test.ts

[test] starting test/vitest/vitest.extension-memory.config.ts
[test] passed 1 Vitest shard in 58.83s

$ PATH=/root/.nvm/versions/node/v24.13.0/bin:$PATH corepack pnpm check:test-types

> openclaw@2026.5.5 check:test-types /root/74/openclaw
> pnpm tsgo:test

> openclaw@2026.5.5 tsgo:core:test /root/74/openclaw
> node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo

> openclaw@2026.5.5 tsgo:extensions:test /root/74/openclaw
> node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo
  • Observed result after fix: the session-yield verification completed successfully on the PR checkout, and the previously failing check:test-types command completed without the model-fallback.test.ts env type error after rebasing onto current upstream/main.
  • What was not tested: live gateway startup with a real multi-thousand-session corpus and end-to-end WebSocket timeout reproduction.
  • Before evidence (optional but encouraged): before this fix, the new regression observed the 11th session file indexing before the scheduled setImmediate ran, failing with expected [ false ] to deeply equal [ true ].

Root Cause (if applicable)

  • Root cause: memory-core session sync processed session transcript batches without a macrotask yield, so large startup/full-reindex work could monopolize the gateway event loop.
  • Missing detection / guardrail: there was no regression test proving session sync yields during large transcript batches.
  • Contributing context (if known): sessions.list already has a similar batching/yield pattern, but memory-core session indexing did not.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/memory-core/src/memory/manager-sync-yield.test.ts
  • Scenario the test should lock in: a scheduled setImmediate runs before later session files continue indexing.
  • Why this is the smallest reliable guardrail: it exercises the real syncSessionFiles() loop while mocking transcript parsing and providers out of scope.
  • Existing test that already covers this (if any): none.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

Large session corpora should no longer make gateway startup/session-start reindexing starve WebSocket handling for long stretches.

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux container
  • Runtime/container: Node v24.13.0 for verification
  • Model/provider: N/A
  • Integration/channel (if any): memory-core session indexing
  • Relevant config (redacted): issue repro config with memory-core sessions source enabled

Steps

  1. Trigger memory-core session sync with more than one session-indexing batch.
  2. Schedule a setImmediate before sync runs.
  3. Verify the event loop gets a chance to run before later files continue indexing.

Expected

  • Session sync yields between batches so gateway work is not starved indefinitely.

Actual

  • After this patch, the scheduled immediate runs before the 11th session file is indexed.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Before the fix, the new regression failed with expected [ false ] to deeply equal [ true ]. After the fix, it passes.

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • Targeted memory-core regression test.
    • Failing check:test-types shard after rebasing onto current upstream/main.
    • Changelog attribution check.
  • Edge cases checked:
    • targeted session files
    • skipped/unchanged session files
    • missing session entries
    • stale session-row pruning
  • What you did not verify:
    • live gateway/TUI reproduction with a real 30+ second corpus.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: very small scheduling overhead during large session syncs.
    • Mitigation: yielding is batched every 10 processed session items and preserves existing indexing order, source behavior, and error propagation.

@clawsweeper

clawsweeper Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR adds event-loop yields while memory-core indexes session transcript files, prunes stale session rows, and parses large transcript files, with focused tests and a changelog entry.

Reproducibility: yes. at source level: current main has a clear session-start path into session transcript indexing without macrotask yields, and #76890 supplies matching event-loop-delay logs. I did not run a live gateway/TUI reproduction in this read-only review.

Real behavior proof
Needs real behavior proof before merge: The PR body provides copied terminal output for mocked Vitest checks and typecheck, but no live gateway/TUI or real large-corpus after-fix proof. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Needs contributor-supplied real runtime proof or a maintainer override; there is no narrow code defect for a ClawSweeper repair PR to fix.

Security
Cleared: The diff only changes memory scheduling/parsing, focused tests, and changelog text; it adds no dependency, workflow, permission, secret, network, or command-execution surface.

Review details

Best possible solution:

Land the focused memory-core yield fix after redacted real runtime proof or an explicit maintainer override, keeping the scope to responsiveness in session indexing and transcript parsing.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main has a clear session-start path into session transcript indexing without macrotask yields, and #76890 supplies matching event-loop-delay logs. I did not run a live gateway/TUI reproduction in this read-only review.

Is this the best way to solve the issue?

Yes for the code direction: batched setImmediate yields match the existing sessions.list responsiveness pattern, and the maintainer head also yields inside single-file transcript parsing. The merge path still needs real runtime proof or a maintainer override.

Acceptance criteria:

  • pnpm test extensions/memory-core/src/memory/manager-sync-yield.test.ts packages/memory-host-sdk/src/host/session-files-yield.test.ts
  • pnpm test extensions/memory-core
  • pnpm check:changed
  • Live gateway/TUI proof with a real large session corpus and private details redacted

What I checked:

Likely related people:

  • steipete: GitHub path history shows repeated recent work on memory-core sync and memory-host transcript code, and the current PR head includes their parser-yield commit. (role: recent area contributor; confidence: high; commits: 53a97fe0a76a, a38c2c233a44, 983fd775e2ca; files: extensions/memory-core/src/memory/manager-sync-ops.ts, extensions/memory-core/src/memory/manager.ts, packages/memory-host-sdk/src/host/session-files.ts)
  • buyitsydney: Recent commits changed archived session transcript visibility in both the memory sync path and memory-host session-file parser. (role: adjacent session transcript contributor; confidence: medium; commits: aba97a4c7cff, 2ffdb5d248a1; files: extensions/memory-core/src/memory/manager-sync-ops.ts, packages/memory-host-sdk/src/host/session-files.ts)
  • zqchris: Recent session-corpus parsing work touched the parser that builds transcript entries consumed by this indexing path. (role: adjacent session corpus contributor; confidence: medium; commits: 82e349a48ad9; files: packages/memory-host-sdk/src/host/session-files.ts)
  • Peetiegonzalez: Authored the recent sessions.list event-loop-blocking reduction that this PR mirrors as a scheduling precedent. (role: adjacent gateway responsiveness contributor; confidence: low; commits: 7fe4ba013ff0; files: src/gateway/session-utils.ts)

Remaining risk / open question:

  • The supplied proof does not exercise a live gateway/TUI or real large session corpus, so the end-to-end timeout improvement remains unproven.

Codex review notes: model gpt-5.5, reasoning high; reviewed against b2bc1e9a5661.

@bitloi bitloi force-pushed the fix/issue-76890-memory-session-yield branch 2 times, most recently from 8072468 to c1168f9 Compare May 3, 2026 22:45
@bitloi

bitloi commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

Codex review: needs maintainer review before merge.

Summary The PR adds batched setImmediate yields to memory-core session transcript indexing and stale session-row pruning, with a new Vitest regression and changelog entry for #76890.

Reproducibility: no. high-confidence live reproduction was run. Source inspection shows current main processes full session-indexing batches and stale-row pruning without a macrotask yield, and the PR’s unit scenario captures the scheduling gap at the source level.

Next step before merge No repair lane is needed because the PR is mergeable with no actionable review finding; remaining work is final CI completion and normal maintainer merge review.

Security Cleared: The diff only changes memory sync scheduling, a focused test, and changelog text; it adds no dependencies, permissions, network calls, secret handling, or command execution surface.

Review details

Thanks. This is addressed on the current head c1168f9a3b.

  • GitHub now reports the PR as mergeable: MERGEABLE with mergeStateStatus: CLEAN.
  • I verified a clean local merge against latest upstream/main; CHANGELOG.md auto-merges cleanly.
  • Fixed the extension lint failure from the empty undici mock classes.
  • Ran the requested checks:
    • pnpm test extensions/memory-core/src/memory/manager-sync-yield.test.ts extensions/memory-core/src/memory/manager.session-reindex.test.ts extensions/memory-core/src/memory/manager-targeted-sync.test.ts → PASS
    • pnpm test extensions/memory-core → PASS
    • pnpm check:changed → PASS
    • pnpm lint → PASS

The earlier dirty-branch finding was for the old head 50971fb52d; the current branch is clean and validated.

@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 6, 2026
@bitloi bitloi force-pushed the fix/issue-76890-memory-session-yield branch from c40a3d4 to 7f56002 Compare May 6, 2026 05:47
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@steipete steipete force-pushed the fix/issue-76890-memory-session-yield branch from 4a3e1f2 to 58d60ba Compare May 10, 2026 06:26
@steipete steipete merged commit 482af6d into openclaw:main May 10, 2026
110 checks passed
@steipete

Copy link
Copy Markdown
Contributor

Landed via rebase merge onto main.

  • Source head: 58d60ba
  • Landed head: 482af6d
  • Gate: focused local tests for session parsing/session-sync yield; Blacksmith Testbox pnpm check:changed passed; PR CI green on source head.

Thanks @bitloi!

@bitloi

bitloi commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Landed via rebase merge onto main.

  • Source head: 58d60ba
  • Landed head: 482af6d
  • Gate: focused local tests for session parsing/session-sync yield; Blacksmith Testbox pnpm check:changed passed; PR CI green on source head.

Thanks @bitloi!

thank you sir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: memory-core Extension: memory-core proof: supplied External PR includes structured after-fix real behavior proof. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory-core: synchronous indexing on session start blocks gateway event loop for 30+ seconds

2 participants