fix(memory): fail fast when embeddings provider is unavailable#90336
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 6, 2026, 7:50 AM ET / 11:50 UTC. Summary PR surface: Source +159, Tests +110, Docs +14. Total +283 across 14 files. Reproducibility: yes. Current main and v2026.6.1 source show providerless memory search can return keyword-only FTS results when FTS is available, and the linked issue has a real active-memory timeout trace; I did not run a live repro in this read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the internal required-provider guard after maintainer review accepts the fail-closed upgrade behavior, while preserving optional FTS behavior for unset, legacy auto, provider none, and local provider paths. Do we have a high-confidence way to reproduce the issue? Yes. Current main and v2026.6.1 source show providerless memory search can return keyword-only FTS results when FTS is available, and the linked issue has a real active-memory timeout trace; I did not run a live repro in this read-only review. Is this the best way to solve the issue? Yes. This is the narrow maintainable fix: it keeps the requirement internal, guards search and sync, preserves intended optional fallback paths, and leaves the related startup-loading work to #89652. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 74331f632b93. Label changesLabel justifications:
Evidence reviewedPR surface: Source +159, Tests +110, Docs +14. Total +283 across 14 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
ba821e5 to
be56782
Compare
|
Implementation-loop closeout for 2519517:
Verification:
Remaining maintainer decision: whether already-migrated legacy |
2519517 to
862bdb2
Compare
|
Maintainer decision for #90336: accept explicit non-local memory embedding providers as fail-closed when unavailable. That includes configs already migrated from legacy Docs are now aligned in @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Final verification update: rebased conflict-free on current main; PR head 76e0485 is mergeable/CLEAN. GitHub CI is green, local focused memory/docs/type/architecture checks passed, codex review is clean, and ClawSweeper re-review completed with no rank-up moves left. Remaining state is normal human maintainer review for the accepted fail-closed provider behavior. |
76e0485 to
8c1b428
Compare
a1abed5 to
e4f9e93
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Why this is careful about fallback: The old FTS-only fallback was useful because it kept memory search from going completely dark when embeddings were unavailable. We do not want to break workflows that were relying on the default or implicit behavior, especially local setups and cases where keyword recall is still better than nothing. But when a user explicitly configures a non-local embedding provider, silently falling back to FTS hides the real problem. The user thinks semantic recall is working, while the system is actually doing keyword-only search. That can make memory feel flaky and can waste time debugging the wrong layer. So this PR keeps fallback for unset/default/local/none-style paths, and only fails closed when an explicit remote provider is required and unavailable. That gives existing workflows a softer upgrade path while making intentional provider configuration honest and debuggable. |
|
Maintainer land-ready note: I am accepting the product behavior tradeoff here: explicit non-local memory embedding providers should fail closed when unavailable, while unset/default/local/none-style paths keep FTS fallback so existing workflows do not lose recall entirely. Current head: e4f9e93 Validation reviewed before merge:
Known CI gap accepted for this merge:
I am merging despite those unrelated baseline failures because the PR-specific behavior, docs, and regression coverage are ready, and the remaining risk is the maintainer-accepted fallback semantics described above. |
Opened on behalf of Onur Solmaz (
osolmaz).Summary
Memory search could silently fall back to keyword-only search when a configured embedding provider was unavailable.
That made
memory_searchlook healthy while semantic recall was actually broken.This change keeps keyword-only search only for intentional or local fallback cases, and fails fast when a concrete remote provider was explicitly configured but cannot be used.
AI-assisted change: implemented and reviewed with a local coding agent.
Fixes #89691
What Changed
The memory manager now derives an internal provider requirement from the existing memory-search config.
That requirement stays out of the public SDK
ResolvedMemorySearchConfigshape, but the manager uses it before search and sync so explicit remote provider failures do not degrade into FTS-only behavior.provider: "none"as explicit FTS-only.auto, and local-transport providers as optional fallback paths.openaias required.Testing
I tested the resolver, active-memory tool behavior, memory manager search/sync behavior, TypeScript compile path, and the guard baselines that current CI exercises.
The focused memory suite covers both the fail-fast case and recovery after provider setup becomes available again.
node scripts/run-vitest.mjs src/agents/memory-search.test.ts src/plugins/memory-runtime.test.ts extensions/memory-core/src/memory/index.test.ts extensions/memory-core/src/tools.test.ts extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/active-memory/index.test.ts extensions/memory-core/src/memory/manager-sync-ops.startup-catchup.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.ts extensions/memory-core/src/memory/manager-sync-ops.interval.test.ts extensions/memory-core/src/memory/manager-sync-ops.archive-delta-bypass.test.tsnode scripts/run-vitest.mjs src/agents/memory-search.test.ts extensions/memory-core/src/memory/index.test.ts extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.tsnode scripts/run-vitest.mjs test/scripts/lint-suppressions.test.tspnpm tsgopnpm check:architecturegit diff --checkcodex review --base origin/mainRisks
The intended behavior change is that explicit remote provider configuration now fails closed instead of pretending semantic memory is available.
That is safer for this bug, but it can expose existing broken configs that were previously hidden by keyword-only fallback.
provider: "auto"remains optional at resolver time, but configs that were already migrated by doctor intoprovider: "openai"are indistinguishable from manually explicit OpenAI configs. That upgrade behavior may still need a maintainer/product decision if preserving those already-migrated setups matters.