Skip to content

fix #88009: [Feature]: batched memory embedding should batch over files#89138

Merged
jalehman merged 18 commits into
openclaw:mainfrom
mushuiyu886:feat/issue-88009
Jun 9, 2026
Merged

fix #88009: [Feature]: batched memory embedding should batch over files#89138
jalehman merged 18 commits into
openclaw:mainfrom
mushuiyu886:feat/issue-88009

Conversation

@mushuiyu886

@mushuiyu886 mushuiyu886 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #88009.

  • Batch memory embedding work across dirty files when the provider explicitly opts into source-wide batch submission.
  • Let OpenAI memory embedding batch jobs cover memory and session sources in one provider job while preserving per-file behavior for other batch runtimes.
  • Make OpenAI batch polling more resilient with retryable status errors, progress counts, and exponential backoff.
  • Document the source-wide batch runtime opt-in and cover its provider contract semantics.
  • Split OpenAI-compatible batch uploads by serialized JSONL byte cap before file upload while preserving request order.

Rebase / CI unblocker status

  • Latest main now carries the five hard Knip allowlist entries in scripts/deadcode-unused-files.allowlist.mjs with an explanatory SQLite scaffold comment.
  • After merging latest main, this PR no longer changes scripts/deadcode-unused-files.allowlist.mjs relative to the base branch.
  • The prior rebase/current-main CI unblocker is now base-branch state, not a PR-specific allowlist change.

Real behavior proof

  • Behavior or issue addressed: Memory indexing with OpenAI-compatible batch embeddings no longer submits one provider batch per memory or session file. Dirty memory files and session transcripts are prepared together, then submitted as one source-wide provider batch until file or request limits require a split.
  • Real environment tested: Linux source checkout at /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009, isolated OPENCLAW_HOME and OPENCLAW_STATE_DIR under /media/vdc/code/ai/aispace/openclaw-issue-88009-evidence/real-behavior-proof, local OpenAI-compatible batch endpoint on 127.0.0.1, provider openai, model text-embedding-3-small, remote.batch.enabled=true, remote.batch.wait=true, remote.batch.timeoutMinutes=1440, fallback=none.
  • Exact steps or command run after this patch:
node /media/vdc/code/ai/aispace/openclaw-issue-88009-evidence/run-real-behavior-proof.mjs
cd /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 && node scripts/run-vitest.mjs run extensions/memory-core/src/memory/index.test.ts extensions/openai/memory-embedding-adapter.test.ts extensions/openai/embedding-batch.test.ts
corepack pnpm@11.2.2 --dir /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 tsgo:core
corepack pnpm@11.2.2 --dir /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 tsgo:extensions
corepack pnpm@11.2.2 --dir /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 exec oxlint extensions/memory-core/src/memory/index.test.ts extensions/memory-core/src/memory/manager-embedding-ops.ts extensions/memory-core/src/memory/manager-sync-ops.ts extensions/openai/embedding-batch.ts extensions/openai/memory-embedding-adapter.ts src/plugins/memory-embedding-providers.ts
cd /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 && node scripts/run-vitest.mjs run src/plugins/contracts/memory-embedding-provider.contract.test.ts
corepack pnpm@11.2.2 --dir /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 tsgo:core
corepack pnpm@11.2.2 --dir /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 exec oxlint src/plugins/contracts/memory-embedding-provider.contract.test.ts
git -C /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 diff --check
cd /media/vdc/code/ai/aispace/openclaw-worktrees/issue-88009 && HOME=/media/vdc/code/ai/aispace/openclaw-issue-88009-evidence/gateway-watch-proof/home OPENCLAW_HOME=/media/vdc/code/ai/aispace/openclaw-issue-88009-evidence/gateway-watch-proof/home OPENCLAW_STATE_DIR=/media/vdc/code/ai/aispace/openclaw-issue-88009-evidence/gateway-watch-proof/state CI=1 timeout 90s corepack pnpm@11.2.2 gateway:watch:raw
# Byte-cap follow-up after merging latest main
git diff --check origin/main...HEAD
node scripts/run-vitest.mjs run packages/memory-host-sdk/src/host/batch-runner.test.ts extensions/openai/embedding-batch.test.ts test/scripts/check-deadcode-unused-files.test.ts
# Follow-up for CI lint/type failures in the byte-cap test
node scripts/run-vitest.mjs run extensions/openai/embedding-batch.test.ts packages/memory-host-sdk/src/host/batch-runner.test.ts
corepack pnpm@11.2.2 tsgo:extensions:test
corepack pnpm@11.2.2 exec oxlint extensions/openai/embedding-batch.test.ts
# Current-head safe force reindex metadata follow-up after commit dd4a3ff558e1dbd27ae73fcb19375262095f20c8
HOME=/tmp/tmp.pqGxWH6wH3/home XDG_CONFIG_HOME=/tmp/tmp.pqGxWH6wH3/xdg-config XDG_STATE_HOME=/tmp/tmp.pqGxWH6wH3/xdg-state OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_DISABLE_AUTO_UPDATE=1 OPENCLAW_TEST_MEMORY_UNSAFE_REINDEX=0 corepack pnpm@11.2.2 openclaw memory index --force --agent main
HOME=/tmp/tmp.pqGxWH6wH3/home XDG_CONFIG_HOME=/tmp/tmp.pqGxWH6wH3/xdg-config XDG_STATE_HOME=/tmp/tmp.pqGxWH6wH3/xdg-state OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_DISABLE_AUTO_UPDATE=1 OPENCLAW_TEST_MEMORY_UNSAFE_REINDEX=0 corepack pnpm@11.2.2 openclaw memory index --force --agent main
HOME=/tmp/tmp.pqGxWH6wH3/home XDG_CONFIG_HOME=/tmp/tmp.pqGxWH6wH3/xdg-config XDG_STATE_HOME=/tmp/tmp.pqGxWH6wH3/xdg-state OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_DISABLE_AUTO_UPDATE=1 OPENCLAW_TEST_MEMORY_UNSAFE_REINDEX=0 corepack pnpm@11.2.2 openclaw memory status --agent main
python3 - <<'ASSERT'
from pathlib import Path
base = Path('/tmp/tmp.pqGxWH6wH3')
index1 = (base / 'source-index1.out').read_text() + (base / 'source-index1.err').read_text()
index2 = (base / 'source-index2.out').read_text() + (base / 'source-index2.err').read_text()
status = (base / 'source-status.out').read_text() + (base / 'source-status.err').read_text()
checks = {
    'first force index updated': 'Memory index updated (main).' in index1,
    'second force index updated': 'Memory index updated (main).' in index2,
    'status dirty no': 'Dirty: no' in status,
    'metadata missing absent': 'index metadata is missing' not in status,
    'vector paused absent': 'paused until memory is rebuilt' not in status,
}
for name, passed in checks.items():
    print(f'{name}: {"PASS" if passed else "FAIL"}')
if not all(checks.values()):
    raise SystemExit(1)
ASSERT
  • Evidence after fix:
Real behavior proof for issue 88009
Exit code: 0
Fixture:
- memory files: 833
- session files: 495
- configured sources: memory,sessions
- remote.batch.enabled: true
- remote.batch.wait: true
- remote.batch.timeoutMinutes: 1440
- fallback: none

Mock provider observations:
- uploaded batch input files: 1
- provider batch jobs: 1
- requests submitted through provider batch jobs: 1328
- inline embedding requests: 0
- status poll sequence: 429/retryable, 200/in_progress, 200/completed

Relevant CLI output:
  [memory] embeddings: source-wide batch prepare files=1328 sources=memory=833,sessions=495 maxFiles=2048 maxRequests=50000
  [memory] embeddings: source-wide batch submit group=1 source=memory+sessions files=1328 chunks=1328 sources=memory=833,sessions=495 reason=end
  [memory] embeddings: openai batch submit
  [memory] embeddings: openai batch created
  [memory] openai batch batch-1 in_progress; progress 0/1328 failed=0; waiting 20ms
  [memory] openai batch batch-1 status check failed: openai batch status failed: 429 {"error":{"message":"local retryable status probe"}}; waiting 40ms
  [memory] openai batch batch-1 in_progress; progress 0/1328 failed=0; waiting 80ms
  Memory index updated (main).

Focused Vitest:
[test] passed 2 Vitest shards in 46.53s
extension-memory: 1 file passed, 28 tests passed
extension-provider-openai: 2 files passed, 4 tests passed

Type checks:
tsgo:core passed
tsgo:extensions passed

Targeted lint:
oxlint passed with no findings for the changed files

Follow-up contract coverage for provider API review:
[test] passed 1 Vitest shard in 18.76s
memory embedding provider contract: 1 file passed, 5 tests passed
tsgo:core passed
follow-up oxlint passed with no findings
git diff --check passed

Byte-cap follow-up after latest main merge:
[test] passed 3 Vitest shards in 51.58s
memory-host-sdk batch runner: 1 file passed, 4 tests passed
openai embedding batch: 1 file passed, 2 tests passed
deadcode allowlist policy: 1 file passed, 12 tests passed
git diff --check origin/main...HEAD passed

CI lint/type follow-up for byte-cap test:
[test] passed 2 Vitest shards in 34.87s
corepack pnpm@11.2.2 tsgo:extensions:test passed
corepack pnpm@11.2.2 exec oxlint extensions/openai/embedding-batch.test.ts passed
PR head f64fa271b8fed78e428e5772ca14cc847e727451 is mergeable with base f8fcb350649b07baa63145d691cfa870fb36a984

Current-head safe force reindex metadata follow-up after commit dd4a3ff558e1dbd27ae73fcb19375262095f20c8:

First forced reindex:
Memory index updated (main).

Second forced reindex:
Memory index updated (main).

Final status:
Memory Search (main)
Provider: none (requested: none)
Model: fts-only
Sources: memory
Indexed: 1/1 files · 1 chunks
Dirty: no
Store: /tmp/tmp.pqGxWH6wH3/state/memory-main.sqlite
Workspace: /tmp/tmp.pqGxWH6wH3/workspace
Vector store: disabled
FTS: ready
Batch: disabled (failures 0/2)

Output assertions:
first force index updated: PASS
second force index updated: PASS
status dirty no: PASS
metadata missing absent: PASS
vector paused absent: PASS

Gateway watch startup path:
[gateway] loading configuration…
[gateway] force: no listeners on port 19891
[gateway] resolving authentication…
[gateway] auth mode=none explicitly configured; all gateway connections are unauthenticated.
[gateway] starting...
[gateway] starting HTTP server...
[gateway] http server listening (8 plugins: acpx, browser, canvas, device-pair, file-transfer, memory-core, phone-control, talk-voice; 3.5s)
[gateway] ready
[gateway] signal SIGTERM received
[shutdown] completed cleanly in 470ms

Gateway watch failure-marker scan:
No INEFFECTIVE_DYNAMIC_IMPORT, Doctor repair failed, gateway:watch exiting, Build emitted, ELIFECYCLE, Error, or failed markers were present in gateway-watch-raw-configured.log.
  • Observed result after fix: The real openclaw memory index --force --verbose path submitted one OpenAI-compatible provider batch job for 1328 total chunks from 833 memory files and 495 session transcripts. The CLI log shows source=memory+sessions, reason=end, no inline embedding requests, progress counts during polling, retry of a 429 status check, and increasing waits from 20ms to 40ms to 80ms. The reporter's gateway:watch:raw startup path also reached http server listening and gateway ready under an isolated local gateway config, then shut down cleanly after the bounded watch proof timeout. The byte-cap follow-up confirms shared embedding batch grouping splits by serialized JSONL UTF-8 bytes, the OpenAI-compatible upload path sends multiple under-cap JSONL files through the configured fetch implementation, and custom-id embedding order is preserved. The current-head safe-force follow-up also verifies the reporter's latest failure mode directly: two consecutive memory index --force runs on the same memory source keep the rebuilt SQLite index clean, preserve index metadata, and do not leave vector recall paused pending a rebuild.
  • What was not tested: Hosted OpenAI production batch service, non-Linux platforms, fixture sizes above the 50,000-request provider batch cap, authenticated gateway/channel traffic, and long-running gateway operation beyond the bounded startup proof window. The local endpoint implemented the OpenAI-compatible /v1/files, /v1/batches, /v1/batches/:id, and /v1/files/:id/content behavior needed for this indexing path. The byte-cap follow-up did not hit a hosted provider's real payload cap; it exercised the production upload runner with a local fake fetch implementation and small maxJsonlBytes override. The current-head safe-force follow-up used the local source CLI with a provider-none/FTS-only memory index to isolate the metadata identity path; it did not exercise hosted embeddings for that specific metadata regression.
  • Target test file: extensions/memory-core/src/memory/index.test.ts, extensions/openai/embedding-batch.test.ts, extensions/openai/memory-embedding-adapter.test.ts, and src/plugins/contracts/memory-embedding-provider.contract.test.ts.

Regression Test Plan

  • Focused memory manager tests cover source-wide batching across dirty memory files, oversized-file chunk grouping, forced memory+session indexing, max-file/request bounds, and non-opt-in batch runtimes staying per-file.
  • OpenAI embedding batch tests cover existing output parsing and batch behavior while the new polling path adds progress, retryable status failures, and backoff.
  • Shared batch runner and OpenAI batch tests cover serialized JSONL UTF-8 byte-cap splitting before file upload while preserving request order.
  • Memory embedding provider contract tests cover the explicit source-wide batch runtime opt-in and returned embedding order semantics.
  • Type checks, targeted lint, and git diff --check were run for the changed memory-core, OpenAI, provider runtime, and provider contract files.
  • gateway:watch:raw was run with isolated local gateway state to verify the startup path reaches gateway ready without the prior dynamic-import/doctor-repair launch failure.

Root Cause

  • Root cause: The sync layer indexed dirty memory files and session transcript files as separate file-level operations. Even when a provider exposed an async batch endpoint, each file reached batchEmbed independently, so a workload with many small files produced many small provider batch jobs. The OpenAI batch poll loop also treated transient status failures as terminal and used a fixed poll interval, which made long-running provider jobs noisy and brittle.

@openclaw-barnacle openclaw-barnacle Bot added extensions: memory-core Extension: memory-core extensions: openai size: XL proof: supplied External PR includes structured after-fix real behavior proof. labels Jun 1, 2026
@clawsweeper

clawsweeper Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 9, 2026, 6:38 AM ET / 10:38 UTC.

Summary
This PR adds source-wide memory embedding batching behind a provider runtime opt-in, enables it for OpenAI-compatible memory embeddings, adapts OpenAI batch polling/upload splitting, updates memory sync/index tests, and documents the new provider contract.

PR surface: Source +658, Tests +658, Docs +10. Total +1326 across 13 files.

Reproducibility: Do we have a high-confidence way to reproduce the issue? Source inspection and reporter logs show current main batches provider work at file boundaries, but I did not run the CLI path in this read-only review.

Review metrics: 2 noteworthy metrics.

  • Plugin runtime API surface: 1 optional flag added (sourceWideBatchEmbed). External memory providers can treat runtime fields as compatibility contracts once the PR ships.
  • Provider batch limits encoded: 50,000 requests, 190 MiB JSONL cap, plus 128-session staging flush. These limits determine when OpenClaw creates additional provider jobs and need to be visible before merge.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🦐 gold shrimp
Patch quality: 🦐 gold shrimp
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Refresh real behavior proof on head 66d362a56dfb1b40585f927e95971f31e4989c89, including a large memory+session run that shows the current staging behavior.
  • Update the provider-contract docs or implementation so the 128-session staging limit is visible and intentional.
  • Have a maintainer confirm the sourceWideBatchEmbed API and OpenAI-compatible upload-cap behavior before landing.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR has useful CLI logs and reporter validation, but the strongest proof predates the latest source-wide session staging and OpenAI upload-sizing commits, so current-head real behavior proof is still needed before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] The latest head still needs refreshed real-behavior proof because source-wide session staging and OpenAI-compatible upload sizing changed after the strongest CLI/log proof and reporter validation.
  • [P1] The new sourceWideBatchEmbed runtime flag becomes a plugin-facing provider contract once merged, so docs and tests need to describe the actual grouping and ordering guarantees precisely.
  • [P1] Large memory+session runs now have a hidden 128-session staging flush, which is probably a reasonable memory cap but currently conflicts with the one-call-until-host-limits wording and older proof examples.
  • [P1] OpenAI-compatible providers with smaller upload caps may still first see a large default JSONL upload before adaptive splitting handles the rejection, so maintainers should confirm that behavior is acceptable without a user-tunable cap.

Maintainer options:

  1. Align contract and proof before merge (recommended)
    Update the provider-contract docs and current-head proof so they show the 128-session staging behavior, OpenAI upload splitting, and the expected large memory+session CLI output from the reviewed head.
  2. Accept staged batching as a maintainer choice
    Maintainers can explicitly accept the 128-session flush as the source-wide host limit, but the PR body and docs should still say that this is intentional before merge.
  3. Pause if the API direction is unsettled
    If the runtime flag or upload-cap behavior is not the desired long-term plugin contract, pause this branch and settle the provider API shape before more implementation churn.

Next step before merge

  • [P1] The remaining blocker is maintainer review of the plugin API/session staging contract plus contributor-supplied current-head proof, not a safe automated repair lane.

Security
Cleared: The diff changes guarded remote batch HTTP usage but keeps uploads under withRemoteHttpResponse with SSRF policy/fetch plumbing, and it does not add dependencies, workflows, secrets, or package scripts.

Review findings

  • [P2] Align source-wide docs with staged session flushes — docs/plugins/sdk-overview.md:118-124
Review details

Best possible solution:

Land the source-wide opt-in approach only after the provider contract, docs, and current-head proof all describe the real staging and upload-cap behavior that maintainers want to support.

Do we have a high-confidence way to reproduce the issue?

Do we have a high-confidence way to reproduce the issue? Source inspection and reporter logs show current main batches provider work at file boundaries, but I did not run the CLI path in this read-only review.

Is this the best way to solve the issue?

Is this the best way to solve the issue? The explicit provider opt-in is a plausible and well-scoped architecture, but the merge-ready solution needs the source-wide contract and current-head proof aligned with the new session staging and upload-cap behavior.

Full review comments:

  • [P2] Align source-wide docs with staged session flushes — docs/plugins/sdk-overview.md:118-124
    The new SDK text says a sourceWideBatchEmbed provider can receive chunks from multiple dirty files and enabled sources in one batchEmbed(...) call up to the host batch limits, but the current PR code also flushes deferred session work every 128 session files. For the reporter-sized memory+session case, that means current head can create multiple provider jobs before the documented file/request caps, so plugin authors and maintainers do not have an accurate contract unless the docs/proof name that staging limit or the implementation folds it into the stated host limits.
    Confidence: 0.83

Overall correctness: patch is incorrect
Overall confidence: 0.78

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 2f02bbcbb3ef.

Label changes

Label justifications:

  • P2: This is a normal-priority performance and provider-contract improvement for memory indexing with limited blast radius but meaningful review surface.
  • merge-risk: 🚨 compatibility: The PR adds a plugin-facing runtime flag and changes OpenAI-compatible batch upload behavior that provider authors may depend on.
  • merge-risk: 🚨 session-state: The PR changes memory/session indexing staging, safe reindex behavior, and persisted memory index metadata paths.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🦐 gold shrimp and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR has useful CLI logs and reporter validation, but the strongest proof predates the latest source-wide session staging and OpenAI upload-sizing commits, so current-head real behavior proof is still needed before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +658, Tests +658, Docs +10. Total +1326 across 13 files.

View PR surface stats
Area Files Added Removed Net
Source 8 855 197 +658
Tests 4 695 37 +658
Docs 1 10 0 +10
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 13 1560 234 +1326

What I checked:

Likely related people:

  • steipete: Recent GitHub history shows multiple memory embedding and OpenAI batch changes, including retry/timeout and batch poll interval work in the same files. (role: recent area contributor; confidence: high; commits: 899dc5f2486b, b1958256fdab, 33c44626d211; files: extensions/memory-core/src/memory/manager-embedding-ops.ts, extensions/openai/embedding-batch.ts)
  • vincentkoc: Recent current-main history includes memory sync/index metadata work and the current main SHA, which sits directly on the session/index identity path touched by this PR. (role: recent area contributor; confidence: high; commits: 2f02bbcbb3ef, d46dc39b18ec, 5a4f868de0db; files: extensions/memory-core/src/memory/manager-sync-ops.ts, extensions/openai/embedding-batch.ts)
  • osolmaz: Recent memory index/provider availability work overlaps the upgrade-sensitive memory sync and index identity behavior that this PR changes. (role: adjacent owner; confidence: medium; commits: 0aea58ab66d4, a4b4fed41287, 7ff29a9e6df6; files: extensions/memory-core/src/memory/manager-sync-ops.ts, extensions/memory-core/src/memory/manager-embedding-ops.ts)
  • shakkernerd: The latest branch updates changed source-wide fallback behavior, OpenAI batch file cap headroom, session batch staging, and upload sizing after the original contributor proof. (role: recent PR branch maintainer; confidence: medium; commits: a021efb7deb3, 468c040facc2, 9a39ab9b5d65; files: extensions/memory-core/src/memory/manager-sync-ops.ts, extensions/openai/embedding-batch.ts, packages/memory-host-sdk/src/host/batch-runner.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels Jun 1, 2026
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels Jun 1, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 1, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 1, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 1, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 1, 2026
@mushuiyu886

Copy link
Copy Markdown
Contributor Author

@hartmark This is the focused respin for #88009. Could you try this branch against your OpenAI-compatible batch provider when you have a chance?

The expected verbose signal is that dirty memory files and sessions are prepared together and submitted as one source-wide provider batch until the file/request caps require a split, for example:

[memory] embeddings: source-wide batch prepare files=1328 sources=memory=833,sessions=495 maxFiles=2048 maxRequests=50000
[memory] embeddings: source-wide batch submit group=1 source=memory+sessions files=1328 chunks=1328 sources=memory=833,sessions=495 reason=end

This version also keeps oversized-file chunks in the same provider batch, reports OpenAI request_counts progress, uses exponential status-poll backoff, and treats transient status-poll failures such as 429/5xx/network errors as retryable until the configured batch timeout.

I also ran the reporter path gateway:watch:raw with isolated local gateway state; it reached http server listening and gateway ready without the prior launch-failure markers.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 1, 2026
@hartmark

hartmark commented Jun 2, 2026

Copy link
Copy Markdown

Cool, l'll take a look as soon as possible.

Thanks for implementing this annoying blocker.

@mushuiyu886 mushuiyu886 requested a review from a team as a code owner June 2, 2026 05:18
@openclaw-barnacle openclaw-barnacle Bot added channel: matrix Channel integration: matrix gateway Gateway runtime scripts Repository scripts commands Command implementations docker Docker and sandbox tooling labels Jun 2, 2026
@hartmark

hartmark commented Jun 5, 2026

Copy link
Copy Markdown

Still no full reindex.

memory status says batch is disabled, that's strange

Memory Search (main)
Provider: openai (requested: openai)
Model: mistral/mistral-embed
Sources: memory, sessions
Indexed: 1731/1731 files · 6706 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
By source:
  memory · 1395/1395 files · 5241 chunks
  sessions · 336/336 files · 1465 chunks
Vector store: unknown
Vector dims: 1024
FTS: ready
Embedding cache: enabled (7354 entries)
Cache cap: 50000
Batch: disabled (failures 0/2)
pnpm openclaw memory --help
...
  openclaw memory index --force
    Force a full reindex.

Running full reindex.

$ pnpm openclaw memory index --force
....
◒  Indexing memory sources (batch)... 0/1731 · elapsed 0:04.

The output hangs on this output for several minutes and then prints out:

Memory index updated (main).

Then when running memory status yields this response:

Memory Search (main)
Provider: openai (requested: openai)
Model: mistral/mistral-embed
Sources: memory, sessions
Indexed: 1731/1731 files · 6706 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Index identity: index metadata is missing
Vector search: paused until memory is rebuilt
Fix: Run: openclaw memory status --index --agent main
By source:
  memory · 1395/1395 files · 5241 chunks
  sessions · 336/336 files · 1465 chunks
Vector store: unknown
FTS: ready
Embedding cache: enabled (7354 entries)
Cache cap: 50000
Batch: disabled (failures 0/2)

Note the message that the index metadata is missing and indexing is paused until rebuilt

@mushuiyu886

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Current head dd4a3ff558e1dbd27ae73fcb19375262095f20c8 now has current-head safe memory index --force proof in the PR body.

The new proof covers the reporter's latest failure mode directly: two consecutive forced reindexes on the same memory source both report Memory index updated (main), and the follow-up memory status shows Dirty: no without index metadata is missing or paused until memory is rebuilt.

The Real behavior proof workflow is passing again: https://github.com/openclaw/openclaw/actions/runs/27048357559/job/79838839784

@clawsweeper

clawsweeper Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@hartmark

hartmark commented Jun 6, 2026

Copy link
Copy Markdown

@mushuiyu886 I can do the --force reindex and openclaw memory search foo works now.

Only thing now is that --force should do a complete rerun of the embeddings, or is it just that the index is rebuilt using the previously saved embeddings?

@mushuiyu886

Copy link
Copy Markdown
Contributor Author

@hartmark Yes — --force here means “force a full safe reindex/resync of the memory index,” not “discard all saved embeddings and regenerate every embedding from the provider.”

During the safe reindex path, the existing embedding cache is carried forward, so unchanged chunks with the same provider/model/cache key and content hash can reuse previously saved embeddings. Changed, missing, or cache-mismatched chunks are embedded again.

So your current result is expected: --force reruns indexing over the sources, but it is intentionally cache-preserving. A mode that wipes cached embeddings and recomputes everything would be a separate stronger option/flag.

@hartmark

hartmark commented Jun 7, 2026

Copy link
Copy Markdown

@hartmark Yes — --force here means “force a full safe reindex/resync of the memory index,” not “discard all saved embeddings and regenerate every embedding from the provider.”

During the safe reindex path, the existing embedding cache is carried forward, so unchanged chunks with the same provider/model/cache key and content hash can reuse previously saved embeddings. Changed, missing, or cache-mismatched chunks are embedded again.

So your current result is expected: --force reruns indexing over the sources, but it is intentionally cache-preserving. A mode that wipes cached embeddings and recomputes everything would be a separate stronger option/flag.

Cool, let's hope we get this merged soon

@hartmark

hartmark commented Jun 8, 2026

Copy link
Copy Markdown

@clawsweeper re-review

@mushuiyu886

Copy link
Copy Markdown
Contributor Author

@jalehman When will this PR be merged? I noticed that many new commits have been added. Is there anyone else who needs to be involved in the merge process?

@jalehman

jalehman commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Merged via squash.

Thanks @mushuiyu886!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: memory-core Extension: memory-core extensions: openai merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: XL status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: batched memory embedding should batch over files

4 participants