Skip to content

fix(memory): fail fast when embeddings provider is unavailable#90336

Merged
osolmaz merged 7 commits into
mainfrom
codex/89691-memory-search-embedding-provider
Jun 6, 2026
Merged

fix(memory): fail fast when embeddings provider is unavailable#90336
osolmaz merged 7 commits into
mainfrom
codex/89691-memory-search-embedding-provider

Conversation

@osolmaz

@osolmaz osolmaz commented Jun 4, 2026

Copy link
Copy Markdown
Member

Opened on behalf of Onur Solmaz (osolmaz).

Summary

Memory search could silently fall back to keyword-only search when a configured embedding provider was unavailable.
That made memory_search look healthy while semantic recall was actually broken.
This change keeps keyword-only search only for intentional or local fallback cases, and fails fast when a concrete remote provider was explicitly configured but cannot be used.

AI-assisted change: implemented and reviewed with a local coding agent.

Fixes #89691

What Changed

The memory manager now derives an internal provider requirement from the existing memory-search config.
That requirement stays out of the public SDK ResolvedMemorySearchConfig shape, but the manager uses it before search and sync so explicit remote provider failures do not degrade into FTS-only behavior.

  • Treat provider: "none" as explicit FTS-only.
  • Treat default, legacy auto, and local-transport providers as optional fallback paths.
  • Treat explicit concrete non-local providers such as openai as required.
  • Added memory manager guards so required-provider search and sync fail with a clear diagnostic when provider setup returns no provider.
  • Reset required-provider initialization after that failure so a later healthy provider setup can recover.
  • Updated active-memory and memory-core tests for the new unavailable-provider behavior.
  • Kept the provider requirement internal so external SDK consumers do not need to construct a new required config field.

Testing

I tested the resolver, active-memory tool behavior, memory manager search/sync behavior, TypeScript compile path, and the guard baselines that current CI exercises.
The focused memory suite covers both the fail-fast case and recovery after provider setup becomes available again.

  • node scripts/run-vitest.mjs src/agents/memory-search.test.ts src/plugins/memory-runtime.test.ts extensions/memory-core/src/memory/index.test.ts extensions/memory-core/src/tools.test.ts extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/active-memory/index.test.ts extensions/memory-core/src/memory/manager-sync-ops.startup-catchup.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.ts extensions/memory-core/src/memory/manager-sync-ops.interval.test.ts extensions/memory-core/src/memory/manager-sync-ops.archive-delta-bypass.test.ts
  • node scripts/run-vitest.mjs src/agents/memory-search.test.ts extensions/memory-core/src/memory/index.test.ts extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.ts
  • node scripts/run-vitest.mjs test/scripts/lint-suppressions.test.ts
  • pnpm tsgo
  • pnpm check:architecture
  • git diff --check
  • codex review --base origin/main

Risks

The intended behavior change is that explicit remote provider configuration now fails closed instead of pretending semantic memory is available.
That is safer for this bug, but it can expose existing broken configs that were previously hidden by keyword-only fallback.

  • Local embedding provider failures still degrade to keyword-only search because local model availability can be optional and environment-dependent.
  • Legacy provider: "auto" remains optional at resolver time, but configs that were already migrated by doctor into provider: "openai" are indistinguishable from manually explicit OpenAI configs. That upgrade behavior may still need a maintainer/product decision if preserving those already-migrated setups matters.

@openclaw-barnacle openclaw-barnacle Bot added extensions: memory-core Extension: memory-core agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels Jun 4, 2026
@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 6, 2026, 7:50 AM ET / 11:50 UTC.

Summary
The PR makes memory-core derive an internal provider requirement from memorySearch config, fail fast for unavailable explicit non-local embedding providers in search/sync, and updates memory docs and focused tests.

PR surface: Source +159, Tests +110, Docs +14. Total +283 across 14 files.

Reproducibility: yes. Current main and v2026.6.1 source show providerless memory search can return keyword-only FTS results when FTS is available, and the linked issue has a real active-memory timeout trace; I did not run a live repro in this read-only review.

Review metrics: 1 noteworthy metric.

  • Provider fallback behavior: 1 existing fallback behavior changed, 0 config keys added. Explicit remote memory embedding providers now fail closed, so maintainers should notice the upgrade effect even though no new config surface is added.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🌊 off-meta tidepool
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] Existing users with explicit non-local memorySearch.provider values and broken provider/auth setup will now get unavailable memory_search results instead of hidden FTS-only recall.
  • [P1] Configs already migrated from legacy provider: "auto" to provider: "openai" take the same fail-closed path as manually explicit OpenAI configs; the PR discussion records this as intentional, but it remains upgrade-sensitive.
  • [P2] This mitigates the hidden fallback and timeout symptom, but it does not prove the original provider registry visibility race cannot recur when provider setup is otherwise healthy.

Maintainer options:

  1. Land With Accepted Fail-Closed Behavior (recommended)
    Maintainers can land this once they are comfortable that explicit remote provider failures should surface as unavailable, including already-migrated provider:auto configs.
  2. Preserve Legacy-Auto Upgrades Separately
    If migrated auto configs need the old fallback semantics, add a migration marker or documented compatibility path before merge.

Next step before merge

  • [P2] Human maintainer review/landing remains because the PR has a protected maintainer label and deliberately changes compatibility/auth-provider fallback behavior; there is no narrow autonomous repair selected.

Security
Cleared: The diff changes memory provider control flow, docs, and tests without adding dependency, workflow, package, or secret-handling surfaces.

Review details

Best possible solution:

Land the internal required-provider guard after maintainer review accepts the fail-closed upgrade behavior, while preserving optional FTS behavior for unset, legacy auto, provider none, and local provider paths.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main and v2026.6.1 source show providerless memory search can return keyword-only FTS results when FTS is available, and the linked issue has a real active-memory timeout trace; I did not run a live repro in this read-only review.

Is this the best way to solve the issue?

Yes. This is the narrow maintainable fix: it keeps the requirement internal, guards search and sync, preserves intended optional fallback paths, and leaves the related startup-loading work to #89652.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 74331f632b93.

Label changes

Label justifications:

  • P1: The linked bug affects active-memory and memory_search provider routing, which can turn an agent recall workflow into hidden fallback or timeout behavior.
  • merge-risk: 🚨 compatibility: The PR intentionally changes existing explicit remote-provider configs from FTS-only fallback to unavailable errors when provider setup is broken.
  • merge-risk: 🚨 auth-provider: The changed behavior depends on embedding provider/auth availability and can surface missing OpenAI or other provider credentials as blocking memory_search failures.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate does not apply because this is a MEMBER-authored maintainer PR, and the body/comments record focused local validation instead.
Evidence reviewed

PR surface:

Source +159, Tests +110, Docs +14. Total +283 across 14 files.

View PR surface stats
Area Files Added Removed Net
Source 3 179 20 +159
Tests 9 113 3 +110
Docs 2 21 7 +14
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 14 313 30 +283

What I checked:

Likely related people:

  • Peter Steinberger: Introduced memory vector search and appears heavily in memory manager/provider-state refactor history around the affected search/provider architecture. (role: feature-history owner; confidence: high; commits: bf11a42c372b, 6e9382b5c8bb, ca27d932b4ff; files: src/agents/memory-search.ts, extensions/memory-core/src/memory/manager.ts, extensions/memory-core/src/memory/manager-provider-state.ts)
  • Vincent Koc: Recent commits and release baseline work touched the memory manager, sync ops, and memory-search config surface in the current history sample. (role: recent area contributor; confidence: medium; commits: f94e4f85f070, 2e08f0f4221f, ca26489fe882; files: extensions/memory-core/src/memory/manager.ts, extensions/memory-core/src/memory/manager-sync-ops.ts, src/agents/memory-search.ts)
  • Anonymous Amit: Authored the merged lexical fallback ranking change that is directly adjacent to the hidden FTS-only fallback behavior this PR changes. (role: fallback behavior contributor; confidence: high; commits: 42590106ab6e; files: extensions/memory-core/src/memory/manager.ts, docs/concepts/memory-search.md)
  • opriz: Authored the merged FTS-only memory search fix that established providerless indexing/search behavior now being narrowed for explicit remote providers. (role: FTS-only behavior contributor; confidence: medium; commits: 41c30f0c5901; files: extensions/memory-core/src/memory/manager.ts, extensions/memory-core/src/memory/manager-sync-ops.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. labels Jun 4, 2026
@osolmaz osolmaz force-pushed the codex/89691-memory-search-embedding-provider branch from ba821e5 to be56782 Compare June 4, 2026 13:23
@openclaw-barnacle openclaw-barnacle Bot added plugin: bonjour Plugin integration: bonjour size: M and removed size: S labels Jun 4, 2026
@osolmaz

osolmaz commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Implementation-loop closeout for 2519517:

  • Implemented the issue Active-memory embedded memory_search intermittently loses embedding provider and falls back to FTS-only #89691 mitigation: explicit concrete non-local memory embedding providers now fail fast as unavailable instead of silently using FTS-only search; provider: "none", unset/default, legacy auto, and local providers keep the intended optional/FTS behavior.
  • Kept the provider requirement internal to the memory manager so ResolvedMemorySearchConfig does not gain a required SDK field.
  • Updated active-memory and memory-core coverage for required-provider fail-fast, sync fail-fast, recovery, and terminal unavailable behavior.

Verification:

  • node scripts/run-vitest.mjs src/agents/memory-search.test.ts src/plugins/memory-runtime.test.ts extensions/memory-core/src/memory/index.test.ts extensions/memory-core/src/tools.test.ts extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/active-memory/index.test.ts extensions/memory-core/src/memory/manager-sync-ops.startup-catchup.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.ts extensions/memory-core/src/memory/manager-sync-ops.interval.test.ts extensions/memory-core/src/memory/manager-sync-ops.archive-delta-bypass.test.ts passed.
  • node scripts/run-vitest.mjs src/agents/memory-search.test.ts extensions/memory-core/src/memory/index.test.ts extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/memory-core/src/memory/manager-sync-yield.test.ts passed after the SDK-compatibility fix.
  • node scripts/run-vitest.mjs test/scripts/lint-suppressions.test.ts passed.
  • node scripts/run-vitest.mjs src/security/audit-config-include-perms.test.ts passed in isolation after the unrelated first CI failure.
  • pnpm tsgo passed.
  • pnpm check:architecture passed.
  • git diff --check passed.
  • codex review --base origin/main found no regressions after the SDK-compatibility fix.
  • PR CI is green after rerunning the failed checks-node-core-fast shard.

Remaining maintainer decision: whether already-migrated legacy provider: "auto" configs that now appear as explicit provider: "openai" should be accepted as fail-closed, or get a separate migration/marker.

@osolmaz osolmaz force-pushed the codex/89691-memory-search-embedding-provider branch from 2519517 to 862bdb2 Compare June 4, 2026 14:49
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation and removed plugin: bonjour Plugin integration: bonjour labels Jun 4, 2026
@osolmaz

osolmaz commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Maintainer decision for #90336: accept explicit non-local memory embedding providers as fail-closed when unavailable.

That includes configs already migrated from legacy provider: "auto" to provider: "openai"; the migrated config is intentionally treated the same as explicit OpenAI because the current runtime cannot distinguish those states without adding migration metadata.

Docs are now aligned in docs/concepts/memory-search.md and docs/reference/memory-config.md. Current head 76e048562cb4a415462c05f582a12fabdd815336 is conflict-free, CI is green, and codex review --base origin/main found no actionable regressions after the docs fix.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@osolmaz

osolmaz commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Final verification update: rebased conflict-free on current main; PR head 76e0485 is mergeable/CLEAN. GitHub CI is green, local focused memory/docs/type/architecture checks passed, codex review is clean, and ClawSweeper re-review completed with no rank-up moves left. Remaining state is normal human maintainer review for the accepted fail-closed provider behavior.

@osolmaz osolmaz force-pushed the codex/89691-memory-search-embedding-provider branch from 76e0485 to 8c1b428 Compare June 5, 2026 09:28
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 6, 2026
@osolmaz

osolmaz commented Jun 6, 2026

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@osolmaz

osolmaz commented Jun 6, 2026

Copy link
Copy Markdown
Member Author

Why this is careful about fallback:

The old FTS-only fallback was useful because it kept memory search from going completely dark when embeddings were unavailable. We do not want to break workflows that were relying on the default or implicit behavior, especially local setups and cases where keyword recall is still better than nothing.

But when a user explicitly configures a non-local embedding provider, silently falling back to FTS hides the real problem. The user thinks semantic recall is working, while the system is actually doing keyword-only search. That can make memory feel flaky and can waste time debugging the wrong layer.

So this PR keeps fallback for unset/default/local/none-style paths, and only fails closed when an explicit remote provider is required and unavailable. That gives existing workflows a softer upgrade path while making intentional provider configuration honest and debuggable.

@osolmaz

osolmaz commented Jun 6, 2026

Copy link
Copy Markdown
Member Author

Maintainer land-ready note:

I am accepting the product behavior tradeoff here: explicit non-local memory embedding providers should fail closed when unavailable, while unset/default/local/none-style paths keep FTS fallback so existing workflows do not lose recall entirely.

Current head: e4f9e93

Validation reviewed before merge:

  • ClawSweeper re-reviewed the current head and marked it ready for maintainer review with rating: diamond lobster.
  • The PR is mergeable and not draft.
  • Focused local proof passed for the memory and active-memory surfaces touched by this PR.
  • GitHub checks are green for the relevant memory, lint, type, dependency, security, and runtime shards.

Known CI gap accepted for this merge:

  • build-artifacts fails in core-support-boundary because current main has extension-test boundary offenders in extensions/google/google.live.test.ts and extensions/minimax/minimax.live.test.ts.
  • check-additional-boundaries-bcd fails for the same two files.
  • This PR does not touch those files, and the same boundary failure reproduced locally after rebasing onto current origin/main.

I am merging despite those unrelated baseline failures because the PR-specific behavior, docs, and regression coverage are ready, and the remaining risk is the maintainer-accepted fallback semantics described above.

@osolmaz osolmaz merged commit 0aea58a into main Jun 6, 2026
157 of 159 checks passed
@osolmaz osolmaz deleted the codex/89691-memory-search-embedding-provider branch June 6, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation extensions: memory-core Extension: memory-core maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Active-memory embedded memory_search intermittently loses embedding provider and falls back to FTS-only

1 participant