Skip to content

feat(plugins): add OpenAI-compatible embedding provider#84998

Closed
mbelinky wants to merge 3 commits into
mbelinky/memory-embedding-provider-deprecationfrom
mbelinky/openai-compatible-generic-embeddings
Closed

feat(plugins): add OpenAI-compatible embedding provider#84998
mbelinky wants to merge 3 commits into
mbelinky/memory-embedding-provider-deprecationfrom
mbelinky/openai-compatible-generic-embeddings

Conversation

@mbelinky

@mbelinky mbelinky commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a bundled openai-compatible-embeddings plugin that registers the generic openai-compatible embedding provider from feat(plugins): add embedding provider contract #84947.
  • Keep it explicit-only: it requires a configured remote.baseUrl and model, skips warmup/preload, and does not inherit global OpenAI chat credentials.
  • Add memory-core integration coverage for both the generic bridge contract and the real OpenAI-compatible plugin entry resolving through memory-core into a local /v1/embeddings HTTP fixture.

Stacked on #85072 and #84991: #84991 adapts memory-core to consume generic embedding providers, and #85072 marks the old memory-specific embedding registration seam as deprecated compatibility.

Verification

Behavior addressed: OpenAI-compatible embedding endpoints can be exposed through the generic embeddingProviders contract and then used by memory-core through the generic-provider bridge.
Real environment tested: local macOS checkout with HTTP fixture servers; Blacksmith Testbox through Crabbox on Linux (tbx_01ks62sf86sqjq39v8f7qbs0fm, Actions run https://github.com/openclaw/openclaw/actions/runs/26250560922). Earlier broader Testbox proof also passed on tbx_01ks5qpe8h3jsf46xfk9hhejkd with Actions run https://github.com/openclaw/openclaw/actions/runs/26240732874. No live third-party endpoint was called.
Exact steps or command run after this patch: node scripts/run-vitest.mjs extensions/memory-core/src/memory/generic-embedding-provider.bridge.test.ts extensions/memory-core/src/memory/generic-embedding-provider.integration.test.ts extensions/openai-compatible-embeddings/src/embedding-provider.test.ts extensions/memory-core/src/memory/embeddings.test.ts src/plugins/contracts/embedding-provider.contract.test.ts src/plugins/embedding-provider-runtime.test.ts src/plugins/contracts/package-manifest.contract.test.ts src/plugins/contracts/extension-runtime-dependencies.contract.test.ts src/plugins/contracts/inventory/bundled-capability-metadata.test.ts test/plugin-npm-package-manifest.test.ts; node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test-openai-compatible-embeddings.tsbuildinfo; focused local rerun after restack: node scripts/run-vitest.mjs src/plugins/status.test.ts src/plugins/compat/registry.test.ts src/plugins/contracts/plugin-sdk-package-contract-guardrails.test.ts extensions/openai-compatible-embeddings/src/embedding-provider.test.ts extensions/memory-core/src/memory/generic-embedding-provider.integration.test.ts; Testbox focused rerun with the same 5-file command; focused oxlint/oxfmt checks; git diff --check.
Evidence after fix: earlier broader Testbox proof passed 454 Vitest tests; latest local and Testbox focused reruns passed 5 files / 66 tests at head 09d409a0bd; formatting and whitespace checks passed.
Observed result after fix: the plugin registers openai-compatible through api.registerEmbeddingProvider, memory-core resolves it via the generic registry for explicit provider requests, request payloads include model/input/dimensions without encoding_format, explicit bearer auth and non-secret routing headers are preserved, runtime cache metadata is sanitized, structured text inputs pass through, generic providers remain excluded from memory auto selection, and the old memory-specific seam now reports a deprecation compatibility warning for external/workspace plugins.
What was not tested: live provider-specific compatibility with an external OpenAI-compatible service; that should be tested with a real endpoint/API key before declaring broad provider coverage. A direct remote CLI-inspect proof was attempted but stopped because Testbox source build stayed long-running before the inspect step; the real CLI inspect warning proof passed locally.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation size: L maintainer Maintainer-authored PR labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch adds a bundled openai-compatible-embeddings plugin, docs/inventory entries, memory-core generic-provider integration coverage, and generic embedding create options for input_type labels.

Reproducibility: not applicable. this is a feature PR, not a current-main bug report. The changed behavior is source-checkable on the branch and covered by fixture-backed tests, but there is no failing current-main path to reproduce.

PR rating
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Summary: The patch looks coherent and fixture-covered, but the missing real endpoint proof keeps overall readiness low for this provider feature.

Rank-up moves:

  • Add redacted terminal output, logs, screenshots, recordings, or linked artifacts from a real OpenAI-compatible embeddings endpoint, with private details removed.
  • Keep the PR stacked behind the generic embedding-provider contract, memory-core bridge, and deprecation PRs until those are approved.
  • Get maintainer approval for the explicit credential and user-supplied remote URL boundary before bundling.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The PR body provides fixture/Testbox proof and linked CI artifacts, but it explicitly says no live OpenAI-compatible embeddings endpoint was called, so real endpoint proof is still needed before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • The branch is stacked on open public SDK and memory-core PRs; merging it before the lower stack lands could leave the bundled provider depending on APIs that are not on main yet.
  • The proof exercises fixture HTTP servers and Testbox runs, but no real OpenAI-compatible embeddings endpoint, so provider-specific auth, payload, response, proxy, and error quirks remain unproven.
  • The bundled provider sends explicit bearer credentials and configured routing headers to a user-supplied remote.baseUrl; the code uses the SSRF guard and sanitizes cache metadata, but maintainers still need to approve that boundary before bundling.
  • The docs steer custom endpoints toward a new explicit provider id, so the final stack needs clear upgrade messaging while preserving the existing direct OpenAI memory adapter path.

Maintainer options:

  1. Land the stack with real endpoint proof (recommended)
    Wait for the lower generic-provider PRs to settle, then require redacted terminal/log/artifact proof from a real OpenAI-compatible embeddings endpoint before merging this bundled provider.
  2. Accept fixture-first provider coverage
    Maintainers can intentionally merge with fixture/Testbox coverage only if live provider quirks are accepted as follow-up compatibility risk.
  3. Defer bundled shipping
    If the public generic-provider API or credential boundary is not approved, pause this PR and revisit the provider as a later bundled or ClawHub-distributed plugin.

Next step before merge
Protected maintainer handling, open stacked dependencies, contributor-owned real endpoint proof, and auth/security-boundary approval make this a human review item rather than an automated repair candidate.

Security
Cleared: No concrete security or supply-chain defect was found; the remaining security-boundary concern is maintainer approval for sending explicit credentials to configured endpoints.

Review details

Best possible solution:

Land this only after the generic contract, memory bridge, and deprecation stack is approved, with redacted real endpoint proof for the request/auth/payload path; otherwise defer the bundled provider until that boundary is settled.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a feature PR, not a current-main bug report. The changed behavior is source-checkable on the branch and covered by fixture-backed tests, but there is no failing current-main path to reproduce.

Is this the best way to solve the issue?

Unclear until maintainer approval: the explicit generic-provider plugin is a maintainable shape if the lower stack is accepted, but bundling it should wait for the API, credential boundary, and real endpoint proof to settle.

Label justifications:

  • P2: This is a normal-priority provider feature for explicit memory embedding configurations, not an urgent regression.
  • merge-risk: 🚨 compatibility: The PR depends on an open public plugin SDK and memory bridge stack that is not yet on current main.
  • merge-risk: 🚨 auth-provider: The new provider intentionally changes credential routing by requiring explicit embedding credentials and avoiding inherited OpenAI chat auth.
  • merge-risk: 🚨 security-boundary: The provider sends credentials and optional routing headers to a user-configured remote URL, which needs maintainer approval despite the SSRF guard.
  • rating: 🦪 silver shellfish: Current PR rating is 🦪 silver shellfish because proof is 🦪 silver shellfish, patch quality is 🐚 platinum hermit, and The patch looks coherent and fixture-covered, but the missing real endpoint proof keeps overall readiness low for this provider feature.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body provides fixture/Testbox proof and linked CI artifacts, but it explicitly says no live OpenAI-compatible embeddings endpoint was called, so real endpoint proof is still needed before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

What I checked:

  • Bundled plugin registration: The PR head adds a bundled plugin entry openai-compatible-embeddings that registers the OpenAI-compatible adapter through api.registerEmbeddingProvider, which depends on the generic embedding-provider API from the lower stack. (extensions/openai-compatible-embeddings/index.ts:4, 49234272ab29)
  • Provider runtime boundary: The provider requires an explicit remote.baseUrl and model, normalizes explicit API-key input, sends requests through fetchWithSsrFGuard, and sanitizes sensitive headers out of runtime cache metadata. (extensions/openai-compatible-embeddings/src/embedding-provider.ts:199, 49234272ab29)
  • Memory integration coverage: The PR adds fixture-backed integration coverage proving memory-core can resolve the new plugin through the generic registry, preserve sanitized runtime cache data, and route query/document input labels. (extensions/memory-core/src/memory/generic-embedding-provider.integration.test.ts:173, 49234272ab29)
  • Stack dependency: Current main does not yet contain the generic embeddingProviders contract files; the branch is stacked on the open contract, memory bridge, and deprecation PRs named in the PR body and related context. (7f4bd454febf)
  • Proof gap from PR body: The PR body supplies local fixture/Testbox proof and linked Actions runs, but explicitly says no live third-party OpenAI-compatible endpoint was called. (49234272ab29)
  • Area history: History points to prior memory embedding/provider ownership around the provider-plugin split and runtime registry sharing, with Peter Steinberger, Vincent Koc, and glitch showing up in the relevant feature-history commands. (77e6e4cf87f7)

Likely related people:

  • Peter Steinberger: Heavy contributor in the sampled memory/provider area and author of the provider-plugin memory embedding refactor that shaped the current seam. (role: feature-history owner; confidence: medium; commits: 77e6e4cf87f7; files: extensions/memory-core/src/memory/embeddings.ts, extensions/openai/embedding-provider.ts, packages/memory-host-sdk/src/host/embeddings-remote-provider.ts)
  • Vincent Koc: Recent history shows memory-host SDK dependency inversion and substantial memory configuration/docs work around this surface. (role: adjacent memory/docs owner; confidence: medium; commits: 2a999bf9c967, a4b767c89bb9; files: packages/memory-host-sdk/src/host/embeddings.types.ts, docs/reference/memory-config.md, docs/plugins/memory-lancedb.md)
  • glitch: Authored the prior fix sharing memory embedding providers across plugin runtime splits, which is directly adjacent to this generic-provider registry stack. (role: runtime registry contributor; confidence: medium; commits: 3cec3bd48bc4; files: src/plugins/memory-embedding-providers.ts, extensions/memory-core/src/memory/embeddings.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7f4bd454febf.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@openclaw-barnacle openclaw-barnacle Bot added extensions: memory-core Extension: memory-core size: XL and removed size: L labels May 21, 2026
@mbelinky mbelinky force-pushed the mbelinky/openai-compatible-generic-embeddings branch from dfdd641 to c1db010 Compare May 21, 2026 16:56
@mbelinky mbelinky force-pushed the mbelinky/memory-generic-embedding-provider-bridge branch from a574850 to 3e858f0 Compare May 21, 2026 18:38
@mbelinky mbelinky force-pushed the mbelinky/openai-compatible-generic-embeddings branch from 13bf584 to f185ecb Compare May 21, 2026 18:39
@socket-security

socket-security Bot commented May 21, 2026

Copy link
Copy Markdown

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

@mbelinky mbelinky force-pushed the mbelinky/openai-compatible-generic-embeddings branch from f185ecb to 63ce84a Compare May 21, 2026 19:07
@mbelinky mbelinky changed the base branch from mbelinky/memory-generic-embedding-provider-bridge to mbelinky/memory-embedding-provider-deprecation May 21, 2026 19:08
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 21, 2026
@mbelinky mbelinky force-pushed the mbelinky/memory-embedding-provider-deprecation branch from e03b9e8 to 55472fe Compare May 21, 2026 19:45
@mbelinky mbelinky force-pushed the mbelinky/openai-compatible-generic-embeddings branch from 63ce84a to 09d409a Compare May 21, 2026 19:45
@vincentkoc vincentkoc self-assigned this May 22, 2026
@vincentkoc vincentkoc force-pushed the mbelinky/memory-embedding-provider-deprecation branch from 55472fe to b25a168 Compare May 22, 2026 01:37
@mbelinky

Copy link
Copy Markdown
Contributor Author

Closing this as superseded by #85269.

The direction changed during maintainer review: openai-compatible should live in core, not as a separate bundled plugin. #85269 landed that core provider at src/plugins/openai-compatible-embedding-provider.ts, plus the memory-core integration, docs, doctor/config coverage, and tests, in 4d89e00.

Thanks to this branch for proving the shape; the implementation now lives in core instead.

@mbelinky mbelinky closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: memory-core Extension: memory-core maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: XL status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants