Skip to content

feat(memory-core): consume generic embedding providers#84991

Closed
mbelinky wants to merge 3 commits into
mbelinky/84947-general-embedding-providers-basefrom
mbelinky/memory-generic-embedding-provider-bridge
Closed

feat(memory-core): consume generic embedding providers#84991
mbelinky wants to merge 3 commits into
mbelinky/84947-general-embedding-providers-basefrom
mbelinky/memory-generic-embedding-provider-bridge

Conversation

@mbelinky

@mbelinky mbelinky commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Teach memory-core to resolve explicit embedding configs from the generic embeddingProviders contract added in feat(plugins): add embedding provider contract #84947.
  • Preserve the existing memory-specific adapter path and auto selection behavior, so generic providers are opt-in for memory until configured explicitly.
  • Add non-mocked bridge proof using a virtual plugin that declares contracts.embeddingProviders, registers a generic provider through the real plugin API, and is consumed by memory-core through explicit config.

Stacked on #84947 via mbelinky/84947-general-embedding-providers-base.

Verification

Behavior addressed: memory-core can consume a generic embedding provider by explicit provider id without changing memory auto-selection or overriding memory-specific providers.

Real environment tested: Blacksmith Testbox through Crabbox on Linux for both the focused test gate and a non-test temporary plugin/config proof. Provider: blacksmith-testbox. Proof ids: tbx_01ks5qfqmvttp0prg28e7p01ng (focused gate, Actions run https://github.com/openclaw/openclaw/actions/runs/26240537485) and tbx_01ks5s24ra7bcez8891q9050b3 (non-test temp plugin proof, Actions run https://github.com/openclaw/openclaw/actions/runs/26241975310). No live third-party endpoint was called in this PR.

Exact steps or command run after this patch: ran the focused Testbox gate for extensions/memory-core/src/memory/generic-embedding-provider.bridge.test.ts, extensions/memory-core/src/memory/embeddings.test.ts, src/plugins/contracts/embedding-provider.contract.test.ts, and src/plugins/embedding-provider-runtime.test.ts, plus extension test typecheck, oxlint, oxfmt, and git diff --check. Then ran a non-test node --import tsx proof that created a temporary plugin directory containing package.json, openclaw.plugin.json, and index.cjs, configured plugins.load.paths to that plugin root, declared contracts.embeddingProviders: ["proof-memory-generic"], registered api.registerEmbeddingProvider(...), and called memory-core createEmbeddingProvider(...) with provider: "proof-memory-generic".

Evidence after fix from the non-test proof:

{
  "pluginRoot": "<temp>/proof-memory-generic",
  "manifestContracts": [
    "proof-memory-generic"
  ],
  "configuredLoadPaths": [
    "<temp>/proof-memory-generic"
  ],
  "globalGenericProvidersAfterColdLoad": [],
  "globalMemoryProvidersAfterColdLoad": [],
  "requestedProvider": "proof-memory-generic",
  "provider": {
    "id": "proof-memory-generic",
    "model": "proof-model",
    "maxInputTokens": 128
  },
  "runtime": {
    "id": "proof-memory-generic",
    "cacheKeyData": {
      "provider": "proof-memory-generic",
      "model": "proof-model",
      "dimensions": 2
    },
    "inlineQueryTimeoutMs": 111,
    "inlineBatchTimeoutMs": 222
  },
  "query": [
    10,
    1
  ],
  "batch": [
    [
      5,
      0
    ],
    [
      5,
      1
    ]
  ],
  "structured": [
    [
      5,
      0
    ]
  ]
}

Observed result after fix: a manifest-owned generic embedding provider from a temporary plugin/config path is resolved by memory-core for explicit provider requests, maps outputDimensionality to generic dimensions, preserves runtime metadata, routes query calls as query embeddings, routes document batch and structured inputs as document embeddings, and does not register or rely on memory-specific providers. The focused Testbox gate also passed 4 Vitest files / 15 tests covering the virtual-plugin bridge, memory-core adapter behavior, generic embedding provider contract tests, and runtime lookup tests.

What was not tested: no live third-party embedding endpoint was called here; that belongs to the follow-up OpenAI-compatible generic provider PR.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation extensions: memory-core Extension: memory-core size: M maintainer Maintainer-authored PR labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adapts memory-core explicit embedding provider selection to use generic embeddingProviders, updates plugin docs/deprecation wording and changelog, and adds bridge tests.

Reproducibility: yes. for the review finding by source inspection: current main retries models.providers.<id>.api for memory providers, while the PR's generic lookup only asks for the requested id. The feature itself is demonstrated by the PR body's temp-plugin terminal proof, not by a current-main bug reproduction.

PR rating
Overall: 🦐 gold shrimp
Proof: 🦞 diamond lobster
Patch quality: 🦐 gold shrimp
Summary: Strong real behavior proof and focused implementation, but the custom provider-id compatibility gap and stacked contract decision limit merge readiness.

Rank-up moves:

  • Preserve models.providers.<id>.api alias resolution for generic embedding providers used by memory-core.
  • Settle feat(plugins): add embedding provider contract #84947 before merging this stacked branch.
  • Rerun or preserve equivalent Testbox/temp-plugin proof after any rebase or API-shape change.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body includes after-fix terminal JSON from a temp-plugin proof plus Testbox ids showing memory-core resolving a manifest-owned generic provider and routing query/document embeddings.

Risk before merge

  • Existing setups that select memory embeddings through a custom provider id, such as models.providers.<id>.api, can fail after a backing adapter migrates to generic-only because the bridge exact-matches the custom id.
  • The branch is stacked on feat(plugins): add embedding provider contract #84947, so merging before that public SDK contract settles would couple memory-core and docs to an unsettled API shape.
  • Explicit memory remote config, including SecretInput-backed API keys and headers, is intentionally passed to manifest-owned generic providers; maintainers should approve that credential boundary before merge.

Maintainer options:

  1. Preserve custom provider aliases first (recommended)
    Update the bridge or generic runtime so memorySearch.provider custom ids still resolve through models.providers.<id>.api before this branch merges.
  2. Wait for the base contract
    Keep this branch draft or paused until feat(plugins): add embedding provider contract #84947 lands or the public SDK contract shape is otherwise approved.
  3. Approve explicit credential routing
    Maintainers can accept the security-boundary risk if explicit memory embedding remote config is allowed to flow to manifest-owned generic embedding providers.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Preserve memory custom provider alias semantics for generic embedding providers: when `memorySearch.provider` points at a configured `models.providers.<id>.api` value, generic lookup should retry the API owner id and include regression coverage for `createEmbeddingProvider` and `resolveEmbeddingProviderFallbackModel`.

Next step before merge
A focused repair can preserve custom provider alias lookup in the new generic bridge; broader base-contract and credential approval remain maintainer decisions.

Security
Cleared: No concrete security or supply-chain defect was found; the remaining security-sensitive point is maintainer approval for explicit credential routing to generic providers.

Review findings

  • [P2] Preserve custom memory provider ids — extensions/memory-core/src/memory/embeddings.ts:127-129
Review details

Best possible solution:

Settle #84947, preserve custom provider alias semantics for generic lookup and fallback model resolution, then merge the bridge with equivalent Testbox/temp-plugin proof and explicit credential-boundary approval.

Do we have a high-confidence way to reproduce the issue?

Yes for the review finding by source inspection: current main retries models.providers.<id>.api for memory providers, while the PR's generic lookup only asks for the requested id. The feature itself is demonstrated by the PR body's temp-plugin terminal proof, not by a current-main bug reproduction.

Is this the best way to solve the issue?

No, not as currently written. The generic bridge is the right direction, but it should preserve the existing custom provider alias contract and wait for the stacked base embedding-provider contract to settle.

Label justifications:

  • P2: This is a normal-priority memory-core/plugin-SDK feature with a concrete compatibility blocker and limited blast radius.
  • merge-risk: 🚨 compatibility: The PR can break existing custom memory embedding provider ids once those adapters become generic-only.
  • merge-risk: 🚨 security-boundary: The PR routes explicit memory remote credentials and headers into manifest-owned generic embedding providers.
  • rating: 🦐 gold shrimp: Current PR rating is 🦐 gold shrimp because proof is 🦞 diamond lobster, patch quality is 🦐 gold shrimp, and Strong real behavior proof and focused implementation, but the custom provider-id compatibility gap and stacked contract decision limit merge readiness.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR body includes after-fix terminal JSON from a temp-plugin proof plus Testbox ids showing memory-core resolving a manifest-owned generic provider and routing query/document embeddings.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal JSON from a temp-plugin proof plus Testbox ids showing memory-core resolving a manifest-owned generic provider and routing query/document embeddings.

Full review comments:

  • [P2] Preserve custom memory provider ids — extensions/memory-core/src/memory/embeddings.ts:127-129
    Current main lets memorySearch.provider use a custom id whose models.providers.<id>.api points at the real adapter owner. This bridge only calls getEmbeddingProvider(id, config) after the legacy lookup misses, so a provider that migrates to generic-only will make existing custom ids throw instead of resolving through the configured API owner; retry the API id for generic lookup and fallback model resolution.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.89

Acceptance criteria:

  • node scripts/run-vitest.mjs extensions/memory-core/src/memory/embeddings.test.ts extensions/memory-core/src/memory/generic-embedding-provider.bridge.test.ts src/plugins/embedding-provider-runtime.test.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test-memory-generic-embedding-bridge.tsbuildinfo
  • node scripts/run-oxlint.mjs extensions/memory-core/src/memory/embeddings.ts extensions/memory-core/src/memory/embeddings.test.ts extensions/memory-core/src/memory/generic-embedding-provider.bridge.test.ts
  • git diff --check

What I checked:

  • Current custom provider alias contract: Current main resolves explicit memory provider ids through models.providers.<id>.api, so a custom memorySearch.provider id can map to the real memory embedding adapter owner. (src/plugins/memory-embedding-provider-runtime.ts:53, 7f4bd454febf)
  • User-facing docs support custom memory provider ids: The memory-search docs say memorySearch.provider can be a custom models.providers.<id> entry when that provider sets api to an embedding adapter owner. Public docs: docs/concepts/memory-search.md. (docs/concepts/memory-search.md:31, 7f4bd454febf)
  • PR generic lookup exact-matches the requested id: At the PR head, the bridge falls back from legacy memory lookup to getEmbeddingProvider(id, config) with the requested id only, so it does not mirror the current models.providers.<id>.api alias retry for generic-only providers. (extensions/memory-core/src/memory/embeddings.ts:127, bfa24945ee4e)
  • Generic runtime is also exact-id only: The stacked generic embedding runtime resolves embeddingProviders by the exact provider id and does not itself provide the memory-specific alias lookup. (src/plugins/embedding-provider-runtime.ts:32, bfa24945ee4e)
  • Bridge proof is real behavior proof: The PR body includes Blacksmith Testbox proof ids plus terminal JSON from a temporary plugin/config proof showing memory-core resolving a manifest-owned generic provider and routing query/document embeddings. (bfa24945ee4e)
  • Related stack is still open: The base generic embedding provider contract remains open, and the follow-up OpenAI-compatible embedding provider PR is also open and stacked on this bridge.

Likely related people:

  • steipete: Prior review history ties the custom memory embedding provider-id compatibility behavior and recent memory embedding/provider-plugin refactors to this account. (role: recent area contributor; confidence: high; commits: a0a0ab4d9e2a, 77e6e4cf87f7, 629baf5fa7f6; files: src/plugins/memory-embedding-provider-runtime.ts, extensions/memory-core/src/memory/embeddings.ts)
  • ngutman: Prior review history links this account to a merged recursive provider discovery fix in the same memory embedding provider registration area. (role: related bug-fix contributor; confidence: medium; commits: 004704817997; files: src/plugins/memory-embedding-providers.ts)
  • gumadeiras: Prior review history links this account to recent memory-search cold plugin load tests around the provider resolution boundary. (role: adjacent test contributor; confidence: medium; commits: d6c90b5af121; files: src/plugins/memory-embedding-provider-runtime.ts)
  • vincentkoc: Live PR metadata shows this account assigned and commit metadata lists them as committer for the current head stack, making them a practical routing candidate for this review cycle. (role: assigned follow-up owner; confidence: medium; commits: 7873c9a81471, 4babfc053805, bfa24945ee4e; files: extensions/memory-core/src/memory/embeddings.ts, extensions/memory-core/src/memory/embeddings.test.ts, extensions/memory-core/src/memory/generic-embedding-provider.bridge.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7f4bd454febf.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress.

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.
What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@mbelinky

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 21, 2026
@mbelinky

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 21, 2026
@mbelinky mbelinky force-pushed the mbelinky/84947-general-embedding-providers-base branch from 17d00ae to a1e7a40 Compare May 21, 2026 18:38
@mbelinky mbelinky force-pushed the mbelinky/memory-generic-embedding-provider-bridge branch from a574850 to 3e858f0 Compare May 21, 2026 18:38
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@vincentkoc vincentkoc self-assigned this May 22, 2026
@vincentkoc vincentkoc force-pushed the mbelinky/84947-general-embedding-providers-base branch from a1e7a40 to e453696 Compare May 22, 2026 01:37
@vincentkoc vincentkoc force-pushed the mbelinky/memory-generic-embedding-provider-bridge branch from 3e858f0 to bfa2494 Compare May 22, 2026 01:37
@mbelinky

Copy link
Copy Markdown
Contributor Author

Closing this as superseded by #85269, which landed the generic embedding provider bridge directly in core.

#85269 includes the memory-core generic provider consumption, runtime lookup, docs, and focused bridge/integration tests that this stacked PR was carrying, then merged as 4d89e00.

No follow-up is needed on this branch; remaining cleanup should continue in the deprecation PR #85072.

@mbelinky mbelinky closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: memory-core Extension: memory-core maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants