Skip to content

feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)#80479

Closed
yaanfpv wants to merge 2 commits into
openclaw:mainfrom
yaanfpv:feat/openai-compatible-embeddings-provider
Closed

feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)#80479
yaanfpv wants to merge 2 commits into
openclaw:mainfrom
yaanfpv:feat/openai-compatible-embeddings-provider

Conversation

@yaanfpv

@yaanfpv yaanfpv commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: operators running a self-hosted OpenAI-compatible embeddings server (llama.cpp's llama-server, Ollama via its /v1 surface, vLLM, TGI, LocalAI, llamafile, or any reverse-proxied internal instance) have no clean adapter for it. Pointing the bundled lmstudio adapter at it triggers an LMStudio-only "load model" warmup that hangs against generic servers and stalls the gateway event loop for ~30 seconds per memory-lancedb embedding-provider rebuild. Pointing the bundled openai adapter at it works, but inherits global OpenAI headers/attribution/api-key resolution, and a removed embedding.baseUrl line silently falls back to api.openai.com which leaks embedded text to the cloud.
  • Why it matters: the symptom is gateway freezes that show up as multi-second sessions.list backlogs and a flooded gateway log. Operators spend hours diagnosing what is actually a UX gap: the bundled adapters do not include a generic local-server option, and the existing in-process local adapter (node-llama-cpp on a .gguf file) does not cover operators who run their embeddings server as a separate HTTP process.
  • What changes: adds a new bundled extension extensions/openai-compatible-embeddings/ that registers an openai-compatible memory embedding provider. The adapter has no warmup, no global config inheritance, fails-fast on missing embedding.baseUrl/embedding.model, and does not auto-select (operator must explicitly opt in with embedding.provider: "openai-compatible").
  • What did NOT change (scope boundary): no existing adapter touched. lmstudio, openai, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local adapters all behave byte-identically. The Plugin SDK surface is unchanged; the new adapter consumes the same public exports the other bundled adapters do. No protocol change, no schema change, no migration, no telemetry. The existing in-process local adapter stays as-is for operators who load .gguf files in-process via node-llama-cpp; the two adapters are complementary, not redundant.

Change Type

  • Feature
  • Docs

Scope

  • Memory / storage
  • Integrations

Linked Issue/PR

Real behavior proof

  • Behavior or issue addressed: an operator running llama-server (llama.cpp) with the BGE-M3 embedding model on http://localhost:8081/v1 had memory-lancedb captures triggering ~30-second event-loop stalls every time the embedding provider rebuilt, because the lmstudio adapter's ensureLmstudioModelLoaded warmup hangs against llama.cpp's OpenAI-compatible server (which does not expose LMStudio's load-model endpoint). The new openai-compatible adapter routes through the same generic createRemoteEmbeddingProvider factory the other adapters use, just without the warmup phase. Embeddings work end-to-end on the first call, no preload required.

  • Real environment tested: macOS 26.4.1 on Apple Silicon (arm64). llama-server from llama.cpp serving bge-m3-Q8_0.gguf (605 MB, 1024 dimensions) on http://127.0.0.1:8081, with --ngl 24 -c 32768 -np 4 -b 512 -ub 512 --mmap --mlock --cont-batching --api-key <set>. Live ~/.openclaw/ with memory-lancedb enabled.

  • Exact steps or command run after this patch: ran pnpm test extensions/openai-compatible-embeddings to validate the adapter posture and the no-warmup invariant. Then invoked the new factory directly through node --import tsx against the live llama-server, capturing the round-trip latency for both embedQuery and embedBatch. Independently verified the same llama-server endpoint with curl -H "Authorization: Bearer ..." http://localhost:8081/v1/embeddings returns 1024-dim vectors with the same model name.

  • Evidence after fix:

    Live invocation of the new adapter from a small Node script (node --import tsx):

    $ node --import tsx /tmp/proof-openai-compatible-embeddings.mjs
    [proof] target  : http://localhost:8081/v1
    [proof] model   : text-embedding-bge-m3
    [proof] apiKey  : <set>
    [proof] factory : 1ms (no warmup, just client construction)
    [proof] client  : baseUrl=http://localhost:8081/v1 model=text-embedding-bge-m3
    [proof] embed   : 124ms, dims=1024, head=[-0.0392, 0.0370, -0.0289, 0.0161, ...]
    [proof] batch   : 25ms, count=4, dims=1024
    [proof] OK. openai-compatible embeddings adapter wired end-to-end against llama.cpp.
    

    Notice the factory took 1 ms (the lmstudio adapter would have taken up to 120 s here against the same server), and the actual embedding round-trip is 124 ms with the expected 1024-dim BGE-M3 output.

    Independent confirmation of the same endpoint via curl, showing the local server answers OpenAI-shaped requests without any vendor-specific preamble:

    $ curl -sS -m 5 -H "Authorization: Bearer ..." -H "Content-Type: application/json" \
        -d '{"model":"text-embedding-bge-m3","input":"hello"}' \
        -w "\nHTTP %{http_code} time=%{time_total}s\n" \
        http://localhost:8081/v1/embeddings | tail -c 200
    ...,0.05932944267988205],"index":0,"object":"embedding"}]}
    HTTP 200 time=0.069077s
    

    Targeted regression test for the adapter posture and the no-warmup invariant:

    $ pnpm test extensions/openai-compatible-embeddings
     Test Files  1 passed (1)
          Tests  4 passed (4)
       Duration  4.52s
    
  • Observed result after fix: provider construction takes 1 ms (no warmup network call). The adapter holds the per-plugin baseUrl/model exactly as configured, with no fallback to any global config block. Embeddings round-trip in well under 200 ms against the live local server. Existing adapters (openai, lmstudio, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local) are untouched.

  • What was not tested: did not run the new adapter inside an actual openclaw gateway process end-to-end, because the dist bundle does not include the new extension yet (the source-only invocation above is the closest equivalent without a release-tagged build). Did not run pnpm check:changed in Testbox; targeted pnpm test extensions/openai-compatible-embeddings plus targeted npx oxlint extensions/openai-compatible-embeddings/ plus targeted pnpm tsgo:prod are all clean on the touched files.

  • Before evidence: not applicable for a feature add. The "before" is "this provider did not exist," and the operator pain it addresses is documented in the Summary above and in linked issue [Feature]: bundled openai-compatible embedding provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI) #80476.

Root Cause

N/A. Feature addition, not a regression fix. (For the underlying operator pain that motivated the addition, see linked issue #80476.)

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: extensions/openai-compatible-embeddings/memory-embedding-adapter.test.ts (new), with the factory in extensions/openai-compatible-embeddings/embedding-provider.ts.
  • Scenario the test should lock in:
    • The adapter's posture: id: "openai-compatible", transport: "remote", no autoSelectPriority, no authProviderId, allowExplicitWhenConfiguredAuto: true, no shouldContinueAutoSelection. This is what stops the adapter from accidentally being auto-selected over an unrelated cloud provider whose key happens to be configured.
    • No warmup or preload during create. The adapter must produce exactly one factory invocation per create call; nothing else.
    • The cache key includes the per-plugin baseUrl and model exactly as supplied, so two different local servers do not share a cache entry.
    • The Authorization header is stripped from the cache key so a rotated bearer does not invalidate cached embeddings.
  • Why this is the smallest reliable guardrail: the adapter is a thin facade over the generic createRemoteEmbeddingProvider. The risk surface is the posture (auto-select / auth / fallback) and the absence of any pre-call side effect. Both are testable in pure-TS with a mocked factory; no live server needed for the unit tests.
  • Existing test that already covers this (if any): no. No bundled adapter today behaves the way openai-compatible needs to (no auth provider, no auto-select, fully self-contained config, no vendor-specific warmup).

User-visible / Behavior Changes

Operators who configure embedding.provider: "openai-compatible" plus embedding.baseUrl and embedding.model under plugins.entries.memory-lancedb.config.embedding get a working embeddings flow against any OpenAI-compatible local server. No behavior change for any operator who has not opted in. Existing lmstudio/openai/local/etc. adapters keep doing exactly what they do today.

Diagram

Before:
  memory-lancedb capture
    -> embedding provider rebuild
    -> lmstudio adapter create()
       -> ensureLmstudioModelLoaded(timeoutMs: 120_000)
          -> POST <local-server>/api/v0/load-model  (LMStudio-only)
          -> server replies 404 / hangs / returns unexpected shape
          -> ~30s event-loop stall before failure log
    -> /v1/embeddings call finally fires (works)

After (with embedding.provider: "openai-compatible"):
  memory-lancedb capture
    -> embedding provider rebuild
    -> openai-compatible adapter create()
       -> client construction only (~1ms)
    -> /v1/embeddings call fires immediately

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No. The optional apiKey is a per-plugin config field, treated identically to existing adapters' apiKey handling. Cache key strips the Authorization header.
  • New/changed network calls? Only when explicitly configured by the operator. No call goes out at startup.
  • Command/tool execution surface changed? No
  • Data access scope changed? No. The adapter is fully self-contained and does not consult any global models.providers.* block, so it cannot leak embedded text to a cloud provider on a stale config.

Repro + Verification

Environment

  • OS: macOS 26.4.1 (arm64)
  • Runtime/container: Node 26.0.0
  • Model/provider: BGE-M3 (Q8_0 GGUF) via llama.cpp llama-server on localhost:8081
  • Integration/channel (if any): N/A (memory plugin)
  • Relevant config (redacted): plugins.entries.memory-lancedb.config.embedding.provider: "openai-compatible", baseUrl: "http://localhost:8081/v1", model: "text-embedding-bge-m3", apiKey: "<bearer>", dimensions: 1024

Steps

  1. Start any OpenAI-compatible local embedding server. For llama.cpp: llama-server -m <bge-m3.gguf> -a text-embedding-bge-m3 --embedding --host 127.0.0.1 --port 8081 --api-key <bearer>.
  2. In ~/.openclaw/openclaw.json set memory-lancedb's embedding block to provider: "openai-compatible" plus baseUrl, model, optional apiKey/headers.
  3. Restart the gateway. memory-lancedb captures and recalls now go through the local server with no warmup stall.

Expected

provider.embedQuery("hello") returns a 1024-dim vector in well under 200 ms. No event-loop stalls. No warmup warnings in the gateway log.

Actual

Matches expected. Verified end-to-end against llama.cpp serving BGE-M3 (terminal output included in Real behavior proof).

Evidence

  • Failing test/log before + passing after (terminal output in Real behavior proof above; the "before" is "this provider did not exist")
  • Trace/log snippets (proof script output, curl output, vitest output)

Human Verification

  • Verified scenarios: ran the new factory against live llama.cpp serving BGE-M3 on macOS 26.4.1 / Apple Silicon. Confirmed embedQuery and embedBatch return correct-dimensionality vectors. Confirmed factory construction completes in 1 ms with no network call (vs lmstudio adapter's ~30s warmup hang against the same server). Verified the cache key contains the per-plugin baseUrl and model, with Authorization stripped. Verified pnpm test extensions/openai-compatible-embeddings (4/4 pass), npx oxlint extensions/openai-compatible-embeddings/ (0 errors), and pnpm tsgo:prod (clean on touched files).
  • Edge cases checked: missing baseUrl throws a clear error rather than silently falling back. Missing model does the same. Adapter has no autoSelectPriority, so it cannot be picked automatically when the operator has another adapter's credentials configured. Headers passed through embedding.headers get attached to every request alongside the Authorization Bearer.
  • What you did not verify: did not run the new adapter inside an actual openclaw gateway process end-to-end (dist bundle does not include the new extension yet). Did not run pnpm check:changed in Testbox.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes. Pure addition.
  • Config/env changes? No required changes. Operators who want the new provider opt in by setting embedding.provider: "openai-compatible" and baseUrl/model.
  • Migration needed? No. Operators who currently work around the gap by pointing lmstudio or openai at a local server can switch when convenient. Their existing setup keeps working.

Risks and Mitigations

  • Risk: an operator confuses the new HTTP-based openai-compatible provider with the existing in-process local provider.
    • Mitigation: the docs example calls out the distinction explicitly, listing the deployment shapes each one targets. The provider id openai-compatible reads as "any server that speaks the OpenAI HTTP API," which is the term llama.cpp / Ollama / vLLM / TGI / LocalAI all use to describe themselves; the existing local id keeps the semantic of "local in-process model file."
  • Risk: an operator removes their embedding.baseUrl line by mistake while the openai-compatible provider is configured.
    • Mitigation: the adapter throws a clear error at create time pointing to the missing field. No fallback to a default URL.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation size: M proof: supplied External PR includes structured after-fix real behavior proof. labels May 11, 2026
@clawsweeper

clawsweeper Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds a bundled openai-compatible memory embedding provider plugin, docs/changelog, labeler routing, tests, and workspace lockfile metadata for self-hosted OpenAI-compatible embedding servers.

Reproducibility: not applicable. as a feature PR rather than a current-main bug report. The PR body supplies credible after-change terminal proof against llama.cpp, and current main lacks an openai-compatible memory embedding provider.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: The patch looks technically sound with strong terminal proof, while the remaining uncertainty is maintainer scope and merge validation rather than a code defect.

Rank-up moves:

  • Get an explicit maintainer decision on bundled core versus ClawHub publication.
  • Before merge, run the normal changed/broad gate and review the workspace lockfile/importer delta.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body includes after-change terminal output from the new adapter against llama.cpp, a curl confirmation, and targeted test output, which is sufficient real behavior proof for this non-visual feature.

Risk before merge

  • Maintainers still need to explicitly decide whether this generic local-server embedding provider should be bundled in OpenClaw core or published externally through ClawHub.
  • The diff adds a new workspace package/importer and lockfile entry, so the dependency-related change should be intentionally accepted even though it only uses a workspace devDependency.
  • The PR body reports targeted tests and live adapter proof, but it also says a full gateway process end-to-end run and broad changed gate were not run after the feature implementation.

Maintainer options:

  1. Decide the mitigation before merge
    If maintainers want this as an official bundled provider, land the narrow no-warmup adapter with the labeler/docs/package metadata and require normal dependency/CI proof; otherwise ask for the same plugin to be published through ClawHub without adding core inventory.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
The remaining action is maintainer scope review and merge validation, not a narrow repair that ClawSweeper should implement automatically.

Security
Cleared: No concrete security or supply-chain regression found; the new network path is explicit, uses the existing remote embedding SSRF helper, and adds only a workspace devDependency/importer.

Review details

Best possible solution:

If maintainers want this as an official bundled provider, land the narrow no-warmup adapter with the labeler/docs/package metadata and require normal dependency/CI proof; otherwise ask for the same plugin to be published through ClawHub without adding core inventory.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a feature PR rather than a current-main bug report. The PR body supplies credible after-change terminal proof against llama.cpp, and current main lacks an openai-compatible memory embedding provider.

Is this the best way to solve the issue?

Unclear as a product decision. The code uses the existing memory embedding provider seam cleanly, but the best final path depends on whether maintainers want this bundled or published through ClawHub.

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-change terminal output from the new adapter against llama.cpp, a curl confirmation, and targeted test output, which is sufficient real behavior proof for this non-visual feature.
  • add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and The patch looks technically sound with strong terminal proof, while the remaining uncertainty is maintainer scope and merge validation rather than a code defect.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-change terminal output from the new adapter against llama.cpp, a curl confirmation, and targeted test output, which is sufficient real behavior proof for this non-visual feature.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: ⏳ waiting on author: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P2: This is a normal-priority memory/provider feature with useful proof and limited blast radius, but it still needs maintainer scope approval before merge.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and The patch looks technically sound with strong terminal proof, while the remaining uncertainty is maintainer scope and merge validation rather than a code defect.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-change terminal output from the new adapter against llama.cpp, a curl confirmation, and targeted test output, which is sufficient real behavior proof for this non-visual feature.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-change terminal output from the new adapter against llama.cpp, a curl confirmation, and targeted test output, which is sufficient real behavior proof for this non-visual feature.

What I checked:

  • PR diff adds explicit provider without warmup: The patch adds extensions/openai-compatible-embeddings/embedding-provider.ts and memory-embedding-adapter.ts; the provider requires per-plugin baseUrl/model, builds headers from per-plugin input, calls createRemoteEmbeddingProvider, and the adapter intentionally has no autoSelectPriority or authProviderId. (extensions/openai-compatible-embeddings/embedding-provider.ts, 229abb496db2)
  • Current memory-lancedb seam passes per-plugin remote settings: Current main constructs provider adapter options from embedding.apiKey and embedding.baseUrl, passes the requested provider id, model, and dimensions, and fails only if the provider registry cannot resolve the id. (extensions/memory-lancedb/index.ts:409, cbf72e5e26ee)
  • Current capability lookup supports manifest-owned embedding providers: Current main resolves memory embedding providers from registered adapters and manifest capability providers keyed by memoryEmbeddingProviders, which matches the new plugin's manifest contract. (src/plugins/memory-embedding-provider-runtime.ts:62, cbf72e5e26ee)
  • Dependency contract uses the existing remote embedding transport: The shared remote embedding provider posts to <baseUrl>/embeddings through fetchRemoteEmbeddingVectors, and the remote HTTP helper builds the same SSRF policy used by other remote memory providers. (packages/memory-host-sdk/src/host/embeddings-remote-provider.ts:17, cbf72e5e26ee)
  • Scope policy favors plugins but sets a high bar for bundled optional plugins: VISION.md says optional capability should usually ship as plugins, ClawHub owns plugin promotion/provenance, and the bar for adding optional plugins to core is intentionally high. (VISION.md:54, cbf72e5e26ee)
  • Related discussion and supplied proof are substantial: The provided PR context includes a live llama.cpp terminal proof, curl confirmation, targeted test output, and author follow-up saying the unsupported headers docs claim and missing labeler route were fixed in the latest head. (229abb496db2)

Likely related people:

  • steipete: Recent GitHub path history shows work on the memory embedding provider registry, provider-plugin extraction, custom provider id resolution, and LM Studio helper surface. (role: recent area contributor; confidence: high; commits: 77e6e4cf87f7, a0a0ab4d9e2a, 1cac6f48f0bd; files: src/plugins/memory-embedding-provider-runtime.ts, extensions/lmstudio/src/embedding-provider.ts, extensions/ollama/src/memory-embedding-adapter.ts)
  • vyctorbrzezowski: Recent GitHub path history for extensions/memory-lancedb/index.ts includes a focused memory-lancedb behavior fix. (role: recent memory-lancedb contributor; confidence: medium; commits: 4d2e70872640; files: extensions/memory-lancedb/index.ts)
  • gumadeiras: Recent GitHub path history shows test work around memory-search cold plugin loads, which is adjacent to the plugin-capability resolution path this PR relies on. (role: adjacent test/runtime contributor; confidence: medium; commits: d6c90b5af121; files: src/plugins/memory-embedding-provider-runtime.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against cbf72e5e26ee.

…al OpenAI-compatible HTTP server

What an operator hits today:

Operators running a self-hosted OpenAI-compatible embeddings server
(llama.cpp's `llama-server`, Ollama via its `/v1` surface, vLLM, TGI,
LocalAI, llamafile, or any reverse-proxied internal instance) have two
inconvenient choices:

1. Point the bundled `lmstudio` adapter at it. Works for the
   `/v1/embeddings` call, but the adapter's `ensureLmstudioModelLoaded`
   warmup calls an LMStudio-only "load model" endpoint that hangs
   against generic servers. The hang blocks the gateway event loop for
   ~30s per memory-lancedb embedding-provider rebuild.

2. Point the bundled `openai` adapter at it. Works (per-plugin baseUrl
   overrides global), but the adapter inherits global OpenAI headers,
   attribution, and api-key resolution; if `embedding.baseUrl` ever
   gets removed the requests fall back to api.openai.com, leaking
   embedded text to the cloud.

What changes:

Adds a new bundled extension `extensions/openai-compatible-embeddings/`
that registers an `openai-compatible` memory embedding provider. The
adapter:

- Has no warmup / preload / model-load probe. The first
  /v1/embeddings call loads the model lazily, which every server in
  this family already does.
- Reads only from the per-plugin `embedding` config block. Does not
  consult any global `models.providers.*` block. Cannot accidentally
  route to a vendor cloud.
- Fails-fast with a clear error message when `embedding.baseUrl` or
  `embedding.model` is missing.
- Does not auto-select. Operators must opt in explicitly with
  `embedding.provider: "openai-compatible"`.

Naming:

`openai-compatible` is the term llama.cpp, Ollama, vLLM, TGI, LocalAI,
and llamafile all use to describe their HTTP API. Distinct from the
existing `local` adapter (extensions/memory-core/src/memory/provider-
adapters.ts), which is `transport: "local"` for in-process
node-llama-cpp on a `.gguf` file. Both stay supported; they target
different deployment shapes.

Tests:

`extensions/openai-compatible-embeddings/memory-embedding-adapter.
test.ts` covers the no-auto-select / no-auth-dependency posture, the
no-warmup invariant during create, the per-plugin baseUrl/model in
cache key, and the Authorization-header strip from the cache key. 4/4
pass.

Docs:

`docs/plugins/memory-lancedb.md` updated with the new provider example,
the safety note about why `openai-compatible` is preferred over
`openai` when the operator also has cloud providers configured for
chat models, and the disambiguation note about `local` vs
`openai-compatible`.
@yaanfpv yaanfpv force-pushed the feat/openai-compatible-embeddings-provider branch from 1fd293a to 29037e9 Compare May 20, 2026 09:41
@github-actions

Copy link
Copy Markdown
Contributor

Dependency Changes Detected

This PR changes dependency-related files. Maintainers should confirm these changes are intentional.

Changed files:

  • extensions/openai-compatible-embeddings/package.json
  • pnpm-lock.yaml

Maintainer follow-up:

  • Review whether the dependency changes are intentional.
  • Inspect resolved package deltas when lockfile or workspace dependency policy changes are present.
  • Run pnpm deps:changes:report -- --base-ref origin/main --markdown /tmp/dependency-changes.md --json /tmp/dependency-changes.json locally for detailed release-style evidence.

@github-actions github-actions Bot added the dependencies-changed PR changes dependency-related files label May 20, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 20, 2026
@yaanfpv

yaanfpv commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto latest main and addressed the clawsweeper P2 by dropping the unsupported headers claim from the changelog entry.

P2 fix (option b, strictly-scoped): the changelog now lists only apiKey as the optional per-plugin embedding config field, matching what memory-lancedb/config.ts actually forwards into remote today (apiKey/baseUrl). Widening the lancedb embedding schema to pass headers through (option a) is a separate change that belongs in a follow-up PR alongside the lancedb config-test addition; keeping this PR strictly scoped to the new adapter avoids touching the memory-lancedb surface here.

The adapter's internal headers parameter (the extra HTTP headers wired through OpenAICompatibleEmbeddingClient) is unchanged, so when the lancedb pass-through lands the headers will flow end-to-end without further code change in this extension.

On the 2 CI failures: both checks-node-core-fast and checks-node-core were failing on a single unrelated test in src/acp/control-plane/manager.test.ts (Cannot read properties of undefined (reading 'size') on cleans actor-tail bookkeeping after session turns complete). This diff touches no src/acp/** code; clawsweeper independently noted the failure looks unrelated. The post-rebase CI run should make this obvious one way or the other; if it persists I'll comment again with the exact post-rebase signal.

Scope question: clawsweeper flagged whether this belongs bundled vs published on ClawHub. The reasoning for bundling: every other open-weight serving stack we already bundle (openai, lmstudio, local) covers the same operator persona, and the failure modes documented in the changelog entry (silent fallback to api.openai.com from the bundled openai adapter; LMStudio-only warmup hang from the bundled lmstudio adapter) are real production traps the bundled tier should cover symmetrically rather than route through ClawHub for a fresh-install operator. Happy to move this to a ClawHub release if maintainers prefer.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Mossy Diff Drake

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: stacks clean commits.
Image traits: location diff observatory; accessory proof snapshot camera; palette sunrise gold and clean white; mood determined; pose peeking out from the egg shell; shell frosted glass shell; lighting soft underwater shimmer; background soft code-shaped tiles.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Mossy Diff Drake in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@yaanfpv

yaanfpv commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the ClawSweeper P2 and P3 from the latest review.

P2 (extensions/openai-compatible-embeddings/embedding-provider.ts:28): removed the headers line from the JSDoc Config-optional block. The adapter source no longer claims headers is a user-configurable per-plugin field, matching what memory-lancedb actually forwards today. The internal headers parameter on OpenAICompatibleEmbeddingClient is unchanged so the lancedb pass-through can land in a follow-up PR without further code churn here.

P3 (.github/labeler.yml): added an extensions: openai-compatible-embeddings route mapped to extensions/openai-compatible-embeddings/**, slotted after the existing extensions: openai entry. Future fixes to this plugin will now get the component label automatically.

Diff is two files, four insertions, one deletion. Branch is now at 229abb496d on top of the rebased base. No CI re-run expected to change behavior, and no test changes were needed for these two fixes.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 20, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 20, 2026
@osolmaz

osolmaz commented May 21, 2026

Copy link
Copy Markdown
Member

Closing in favor of #84930, thank you! You will be credited

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies-changed PR changes dependency-related files docs Improvements or additions to documentation P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: bundled openai-compatible embedding provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)

2 participants