fix(memory-core): stream embedding cache seed during reindex#73067
fix(memory-core): stream embedding cache seed during reindex#73067parkertoddbrooks wants to merge 1 commit into
Conversation
Greptile SummarySwitches Confidence Score: 4/5Safe to merge; the streaming change is correct and the transaction/error-handling paths are preserved. Only P2 findings: a harmless empty-transaction regression from removing the early-return guard, and a test fragility note about relying on an implicit identity between manager.db and sourceDb. No logic errors or data-safety issues. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/memory-core/src/memory/manager-sync-ops.ts
Line: 329-337
Comment:
**Empty-cache path now issues an unnecessary BEGIN/COMMIT**
Removing the `if (!rows.length) { return; }` guard means `this.db.exec("BEGIN")` and `this.db.exec("COMMIT")` are always executed, even when `embedding_cache` is empty. The old guard was there precisely to skip the transaction in that case. The behaviour is functionally harmless (an empty transaction completes fine), but it is a silent regression from the original intent and adds an unnecessary write-ahead log entry on every reindex for users who have the cache empty.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/memory-core/src/memory/index.test.ts
Line: 462-475
Comment:
**Spy targets the wrong database instance**
`db` is captured as `manager.db` (the original database) _before_ the sync call. During `safeReindex`, the manager temporarily sets `this.db = tempDb`, then calls `seedEmbeddingCache(originalDb)`. Inside `seedEmbeddingCache`, the SELECT goes through `sourceDb.prepare(...)` (`sourceDb === originalDb === db`) — so the spy _does_ intercept it. However, `this.db.prepare(INSERT ...)` executes against `tempDb`, which has no spy. If the manager ever reassigns `this.db` to a different object before the spy is installed (e.g., via a concurrent reindex or hot-reload path), the spy would silently fail to intercept the SELECT and the test would pass vacuously without detecting a regression back to `.all()`. A comment explicitly asserting `sourceDb === db` would make this invariant clear and prevent silent test drift.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(memory-core): stream embedding cache..." | Re-trigger Greptile |
| @@ -338,6 +335,9 @@ export abstract class MemoryManagerSyncOps { | |||
| updated_at=excluded.updated_at`, | |||
| ); | |||
| this.db.exec("BEGIN"); | |||
There was a problem hiding this comment.
Empty-cache path now issues an unnecessary BEGIN/COMMIT
Removing the if (!rows.length) { return; } guard means this.db.exec("BEGIN") and this.db.exec("COMMIT") are always executed, even when embedding_cache is empty. The old guard was there precisely to skip the transaction in that case. The behaviour is functionally harmless (an empty transaction completes fine), but it is a silent regression from the original intent and adds an unnecessary write-ahead log entry on every reindex for users who have the cache empty.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/memory-core/src/memory/manager-sync-ops.ts
Line: 329-337
Comment:
**Empty-cache path now issues an unnecessary BEGIN/COMMIT**
Removing the `if (!rows.length) { return; }` guard means `this.db.exec("BEGIN")` and `this.db.exec("COMMIT")` are always executed, even when `embedding_cache` is empty. The old guard was there precisely to skip the transaction in that case. The behaviour is functionally harmless (an empty transaction completes fine), but it is a silent regression from the original intent and adds an unnecessary write-ahead log entry on every reindex for users who have the cache empty.
How can I resolve this? If you propose a fix, please make it concise.| const db = ( | ||
| manager as unknown as { | ||
| db: { | ||
| prepare: (sql: string) => unknown; | ||
| }; | ||
| } | ||
| ).db; | ||
| const originalPrepare = db.prepare.bind(db); | ||
| const cachedRows = ( | ||
| originalPrepare( | ||
| "SELECT provider, model, provider_key, hash, embedding, dims, updated_at FROM embedding_cache", | ||
| ) as StatementWithAll | ||
| ).all(); | ||
| expect(cachedRows.length).toBeGreaterThan(0); |
There was a problem hiding this comment.
Spy targets the wrong database instance
db is captured as manager.db (the original database) before the sync call. During safeReindex, the manager temporarily sets this.db = tempDb, then calls seedEmbeddingCache(originalDb). Inside seedEmbeddingCache, the SELECT goes through sourceDb.prepare(...) (sourceDb === originalDb === db) — so the spy does intercept it. However, this.db.prepare(INSERT ...) executes against tempDb, which has no spy. If the manager ever reassigns this.db to a different object before the spy is installed (e.g., via a concurrent reindex or hot-reload path), the spy would silently fail to intercept the SELECT and the test would pass vacuously without detecting a regression back to .all(). A comment explicitly asserting sourceDb === db would make this invariant clear and prevent silent test drift.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/memory-core/src/memory/index.test.ts
Line: 462-475
Comment:
**Spy targets the wrong database instance**
`db` is captured as `manager.db` (the original database) _before_ the sync call. During `safeReindex`, the manager temporarily sets `this.db = tempDb`, then calls `seedEmbeddingCache(originalDb)`. Inside `seedEmbeddingCache`, the SELECT goes through `sourceDb.prepare(...)` (`sourceDb === originalDb === db`) — so the spy _does_ intercept it. However, `this.db.prepare(INSERT ...)` executes against `tempDb`, which has no spy. If the manager ever reassigns `this.db` to a different object before the spy is installed (e.g., via a concurrent reindex or hot-reload path), the spy would silently fail to intercept the SELECT and the test would pass vacuously without detecting a regression back to `.all()`. A comment explicitly asserting `sourceDb === db` would make this invariant clear and prevent silent test drift.
How can I resolve this? If you propose a fix, please make it concise.- stream safe-reindex embedding-cache seeding with SQLite iterate() - avoid no-op empty-cache transactions and keep regression coverage explicit - supersedes #73067 Thanks @parkertoddbrooks.
|
Landed this via maintainer PR #73118 because GitHub rejected maintainer pushes to this fork branch despite maintainer edits being enabled.
The changelog keeps the #73067 reference and credits @parkertoddbrooks. Thanks @parkertoddbrooks! |
- stream safe-reindex embedding-cache seeding with SQLite iterate() - avoid no-op empty-cache transactions and keep regression coverage explicit - supersedes openclaw#73067 Thanks @parkertoddbrooks.
- stream safe-reindex embedding-cache seeding with SQLite iterate() - avoid no-op empty-cache transactions and keep regression coverage explicit - supersedes openclaw#73067 Thanks @parkertoddbrooks.
- stream safe-reindex embedding-cache seeding with SQLite iterate() - avoid no-op empty-cache transactions and keep regression coverage explicit - supersedes openclaw#73067 Thanks @parkertoddbrooks.
- stream safe-reindex embedding-cache seeding with SQLite iterate() - avoid no-op empty-cache transactions and keep regression coverage explicit - supersedes openclaw#73067 Thanks @parkertoddbrooks.
- stream safe-reindex embedding-cache seeding with SQLite iterate() - avoid no-op empty-cache transactions and keep regression coverage explicit - supersedes openclaw#73067 Thanks @parkertoddbrooks.
- stream safe-reindex embedding-cache seeding with SQLite iterate() - avoid no-op empty-cache transactions and keep regression coverage explicit - supersedes openclaw#73067 Thanks @parkertoddbrooks.
Summary
embedding_cacherows during safe memory-core reindex instead of materializing the full table with.all().all()Why
Large local
embedding_cachetables can exceed the V8 heap whenseedEmbeddingCache()materializes every row before copying into the temp reindex database. Streaming fixes the heap spike, and cooperative yielding prevents long reindex passes from blocking gateway health/readiness checks.Tests
pnpm tsgopnpm test extensions/memory-core/src/memory/index.test.tspnpm test extensions/memory-core/src/memory/manager.sync-errors-do-not-crash.test.tspnpm test extensions/memory-core/src/memory