Skip to content

Commit 727c853

Browse files
dutifulbobyaanfpv
andcommitted
feat(memory): add openai-compatible embeddings provider
Co-authored-by: Soham Patankar <102520430+yaanfpv@users.noreply.github.com>
1 parent 88c49f9 commit 727c853

16 files changed

Lines changed: 958 additions & 213 deletions

.github/labeler.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,10 @@
383383
- changed-files:
384384
- any-glob-to-any-file:
385385
- "extensions/openai/**"
386+
"extensions: openai-compatible-embeddings":
387+
- changed-files:
388+
- any-glob-to-any-file:
389+
- "extensions/openai-compatible-embeddings/**"
386390
"extensions: codex":
387391
- changed-files:
388392
- any-glob-to-any-file:

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Docs: https://docs.openclaw.ai
77
### Changes
88

99
- Dependencies: refresh provider, plugin, UI, and tooling packages, update `protobufjs` to 8.4.0 to clear the current npm advisory, and carry the Claude ACP completion patch forward to `@agentclientprotocol/claude-agent-acp` 0.36.1.
10+
- Memory/embeddings: add an explicit `openai-compatible` provider for self-hosted `/v1/embeddings` servers such as Ollama, llama.cpp `llama-server`, vLLM, TGI, LocalAI, and llamafile, with no warmup call, no cloud credential inheritance, sanitized cache keys, and direct HTTP contract coverage. Thanks @yaanfpv.
1011
- Tests/perf: isolate doctor core health check unit coverage from real skills/workspace discovery so `doctor-core-checks` no longer dominates unit perf while keeping one real skills-readiness smoke. (#84493) Thanks @frankekn.
1112

1213
### Fixes

docs/plugins/memory-lancedb.md

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -176,10 +176,53 @@ parameter, while others ignore it and always return `number[]` vectors.
176176
`memory-lancedb` therefore omits `encoding_format` on embedding requests and
177177
accepts either float-array responses or base64-encoded float32 responses.
178178

179-
If you have a raw OpenAI-compatible embeddings endpoint that does not have a
180-
bundled provider adapter, omit `embedding.provider` (or leave it as `openai`) and
181-
set `embedding.apiKey` plus `embedding.baseUrl`. This preserves the direct
182-
OpenAI-compatible client path.
179+
For self-hosted OpenAI-compatible embedding servers (llama.cpp's
180+
`llama-server`, Ollama via its `/v1` surface, vLLM, TGI, LocalAI,
181+
llamafile, or any reverse-proxied internal instance) use the bundled
182+
`openai-compatible` provider:
183+
184+
```json5
185+
{
186+
plugins: {
187+
entries: {
188+
"memory-lancedb": {
189+
enabled: true,
190+
config: {
191+
embedding: {
192+
provider: "openai-compatible",
193+
baseUrl: "http://localhost:8081/v1",
194+
model: "text-embedding-bge-m3",
195+
apiKey: "${LLAMA_API_TOKEN}",
196+
dimensions: 1024,
197+
},
198+
},
199+
},
200+
},
201+
},
202+
}
203+
```
204+
205+
The `openai-compatible` provider is fully self-contained: it does not
206+
inherit headers, auth, or fallback URLs from any global
207+
`models.providers.*` block, and it fails-fast with a clear error when
208+
`embedding.baseUrl` or `embedding.model` is missing. Use it when your
209+
operator setup also has cloud providers (real OpenAI, etc.) configured
210+
for chat models, so an accidentally-removed `baseUrl` line cannot route
211+
embeddings to the cloud. Omit `apiKey` for servers that do not require
212+
auth (e.g. a default `ollama serve`).
213+
214+
The `openai-compatible` provider is distinct from the in-process `local`
215+
provider, which uses `node-llama-cpp` to load a `.gguf` file directly
216+
into the gateway process. Choose `openai-compatible` when your
217+
embeddings live behind an HTTP server you run separately (most
218+
self-hosted setups). Choose `local` when you want the model to live in
219+
the gateway process itself.
220+
221+
If your vendor's OpenAI-compatible embeddings endpoint differs from the
222+
local-server family above, omit `embedding.provider` (or leave it as
223+
`openai`) and set `embedding.apiKey` plus `embedding.baseUrl`. This
224+
preserves the direct OpenAI-compatible client path through the bundled
225+
`openai` adapter.
183226

184227
Set `embedding.dimensions` for providers whose model dimensions are not built
185228
in. For example, ZhiPu `embedding-3` uses `2048` dimensions:

docs/plugins/plugin-inventory.md

Lines changed: 89 additions & 88 deletions
Large diffs are not rendered by default.

docs/plugins/reference.md

Lines changed: 122 additions & 121 deletions
Large diffs are not rendered by default.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
summary: "Adds memory embedding provider support."
3+
read_when:
4+
- You are installing, configuring, or auditing the openai-compatible-embeddings plugin
5+
title: "OpenAI Compatible Embeddings plugin"
6+
---
7+
8+
# OpenAI Compatible Embeddings plugin
9+
10+
Adds memory embedding provider support.
11+
12+
## Distribution
13+
14+
- Package: `@openclaw/openai-compatible-embeddings-provider`
15+
- Install route: included in OpenClaw
16+
17+
## Surface
18+
19+
contracts: memoryEmbeddingProviders
Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
import { createServer, type IncomingMessage, type ServerResponse } from "node:http";
2+
import type { AddressInfo } from "node:net";
3+
import type { MemoryEmbeddingProviderCreateOptions } from "openclaw/plugin-sdk/memory-core-host-engine-embeddings";
4+
import { afterEach, describe, expect, it } from "vitest";
5+
import { createOpenAICompatibleEmbeddingProvider } from "./embedding-provider.js";
6+
7+
type CapturedRequest = {
8+
method: string | undefined;
9+
url: string | undefined;
10+
headers: IncomingMessage["headers"];
11+
body: Record<string, unknown>;
12+
};
13+
14+
type FixtureResponse = {
15+
object: "list";
16+
data: Array<{
17+
object?: "embedding";
18+
embedding: number[];
19+
index: number;
20+
}>;
21+
model?: string;
22+
usage?: {
23+
prompt_tokens?: number;
24+
total_tokens?: number;
25+
};
26+
};
27+
28+
const servers: Array<{ close: () => Promise<void> }> = [];
29+
30+
function createOptions(
31+
overrides: Partial<MemoryEmbeddingProviderCreateOptions> = {},
32+
): MemoryEmbeddingProviderCreateOptions {
33+
return {
34+
config: {} as MemoryEmbeddingProviderCreateOptions["config"],
35+
provider: "openai-compatible",
36+
model: "text-embedding-bge-m3",
37+
fallback: "none",
38+
...overrides,
39+
};
40+
}
41+
42+
async function readJsonBody(req: IncomingMessage): Promise<Record<string, unknown>> {
43+
const chunks: Buffer[] = [];
44+
for await (const chunk of req) {
45+
chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk));
46+
}
47+
const text = Buffer.concat(chunks).toString("utf8");
48+
return JSON.parse(text) as Record<string, unknown>;
49+
}
50+
51+
async function startEmbeddingServer(params?: {
52+
token?: string;
53+
respond?: (request: CapturedRequest) => FixtureResponse | Record<string, unknown>;
54+
status?: number;
55+
}): Promise<{ baseUrl: string; requests: CapturedRequest[] }> {
56+
const requests: CapturedRequest[] = [];
57+
const server = createServer(async (req: IncomingMessage, res: ServerResponse) => {
58+
try {
59+
const body = await readJsonBody(req);
60+
const captured: CapturedRequest = {
61+
method: req.method,
62+
url: req.url,
63+
headers: req.headers,
64+
body,
65+
};
66+
requests.push(captured);
67+
68+
if (params?.token) {
69+
expect(req.headers.authorization).toBe(`Bearer ${params.token}`);
70+
} else {
71+
expect(req.headers.authorization).toBeUndefined();
72+
}
73+
74+
res.writeHead(params?.status ?? 200, { "content-type": "application/json" });
75+
res.end(
76+
JSON.stringify(
77+
params?.respond?.(captured) ?? {
78+
object: "list",
79+
data: [{ object: "embedding", embedding: [0.1, 0.2, 0.3], index: 0 }],
80+
model: body.model,
81+
},
82+
),
83+
);
84+
} catch (error) {
85+
res.writeHead(500, { "content-type": "application/json" });
86+
res.end(JSON.stringify({ error: error instanceof Error ? error.message : String(error) }));
87+
}
88+
});
89+
90+
await new Promise<void>((resolve, reject) => {
91+
server.once("error", reject);
92+
server.listen(0, "127.0.0.1", () => {
93+
server.off("error", reject);
94+
resolve();
95+
});
96+
});
97+
98+
servers.push({
99+
close: () =>
100+
new Promise<void>((resolve, reject) => {
101+
server.close((error) => (error ? reject(error) : resolve()));
102+
}),
103+
});
104+
105+
const address = server.address() as AddressInfo;
106+
return {
107+
baseUrl: `http://127.0.0.1:${address.port}/v1`,
108+
requests,
109+
};
110+
}
111+
112+
afterEach(async () => {
113+
const pending = servers.splice(0);
114+
await Promise.all(pending.map((server) => server.close()));
115+
});
116+
117+
describe("openai-compatible embedding provider", () => {
118+
it("posts OpenAI-compatible embedding requests without warming up during create", async () => {
119+
const token = "local-test-token";
120+
const server = await startEmbeddingServer({
121+
token,
122+
respond: ({ body }) => {
123+
const input = body.input;
124+
const texts = Array.isArray(input) ? input : [input];
125+
return {
126+
object: "list",
127+
data: texts.map((text, index) => ({
128+
object: "embedding",
129+
embedding: [String(text).length, index + 0.25, 1],
130+
index,
131+
})),
132+
model: String(body.model),
133+
usage: { prompt_tokens: texts.length, total_tokens: texts.length },
134+
};
135+
},
136+
});
137+
138+
const { provider, client } = await createOpenAICompatibleEmbeddingProvider(
139+
createOptions({
140+
model: "text-embedding-bge-m3",
141+
outputDimensionality: 1024,
142+
remote: {
143+
baseUrl: ` ${server.baseUrl} `,
144+
apiKey: ` ${token} `,
145+
headers: { "x-local-runtime": "ollama" },
146+
},
147+
}),
148+
);
149+
150+
expect(provider.id).toBe("openai-compatible");
151+
expect(provider.model).toBe("text-embedding-bge-m3");
152+
expect(client.baseUrl).toBe(server.baseUrl);
153+
expect(client.headers.authorization).toBe(`Bearer ${token}`);
154+
expect(server.requests).toHaveLength(0);
155+
156+
await expect(provider.embedQuery("hello")).resolves.toEqual([5, 0.25, 1]);
157+
await expect(provider.embedBatch(["a", "abcd"])).resolves.toEqual([
158+
[1, 0.25, 1],
159+
[4, 1.25, 1],
160+
]);
161+
162+
expect(server.requests).toHaveLength(2);
163+
expect(server.requests[0]).toMatchObject({
164+
method: "POST",
165+
url: "/v1/embeddings",
166+
body: {
167+
model: "text-embedding-bge-m3",
168+
input: ["hello"],
169+
dimensions: 1024,
170+
},
171+
});
172+
expect(server.requests[0]?.body).not.toHaveProperty("encoding_format");
173+
expect(server.requests[0]?.headers["content-type"]).toContain("application/json");
174+
expect(server.requests[0]?.headers.accept).toBe("application/json");
175+
expect(server.requests[0]?.headers["x-local-runtime"]).toBe("ollama");
176+
expect(server.requests[1]?.body).toEqual({
177+
model: "text-embedding-bge-m3",
178+
input: ["a", "abcd"],
179+
dimensions: 1024,
180+
});
181+
});
182+
183+
it("omits Authorization when no apiKey is configured", async () => {
184+
const server = await startEmbeddingServer();
185+
const { provider, client } = await createOpenAICompatibleEmbeddingProvider(
186+
createOptions({
187+
model: "nomic-embed-text",
188+
remote: { baseUrl: server.baseUrl },
189+
}),
190+
);
191+
192+
expect(client.headers).not.toHaveProperty("authorization");
193+
194+
await expect(provider.embedQuery("hello")).resolves.toEqual([0.1, 0.2, 0.3]);
195+
expect(server.requests[0]?.headers.authorization).toBeUndefined();
196+
});
197+
198+
it.each([
199+
{
200+
runtime: "Ollama",
201+
response: {
202+
object: "list",
203+
data: [{ object: "embedding", embedding: [0.11, 0.12], index: 0 }],
204+
model: "nomic-embed-text",
205+
usage: { prompt_tokens: 1, total_tokens: 1 },
206+
},
207+
},
208+
{
209+
runtime: "llama.cpp llama-server",
210+
response: {
211+
object: "list",
212+
data: [{ object: "embedding", embedding: [0.21, 0.22], index: 0 }],
213+
model: "bge-small-en-v1.5",
214+
},
215+
},
216+
{
217+
runtime: "vLLM",
218+
response: {
219+
object: "list",
220+
data: [{ object: "embedding", embedding: [0.31, 0.32], index: 0 }],
221+
model: "intfloat/e5-small-v2",
222+
},
223+
},
224+
{
225+
runtime: "LocalAI",
226+
response: {
227+
object: "list",
228+
data: [{ object: "embedding", embedding: [0.41, 0.42], index: 0 }],
229+
model: "text-embedding-ada-002",
230+
},
231+
},
232+
{
233+
runtime: "TGI-compatible server",
234+
response: {
235+
object: "list",
236+
data: [{ object: "embedding", embedding: [0.51, 0.52], index: 0 }],
237+
model: "tei-bge-small",
238+
},
239+
},
240+
{
241+
runtime: "llamafile",
242+
response: {
243+
object: "list",
244+
data: [{ object: "embedding", embedding: [0.61, 0.62], index: 0 }],
245+
model: "all-MiniLM-L6-v2",
246+
},
247+
},
248+
] satisfies Array<{ runtime: string; response: FixtureResponse }>)(
249+
"parses $runtime OpenAI-compatible embedding responses through the same path",
250+
async ({ response }) => {
251+
const server = await startEmbeddingServer({ respond: () => response });
252+
const { provider } = await createOpenAICompatibleEmbeddingProvider(
253+
createOptions({
254+
model: response.model ?? "embedding-model",
255+
remote: { baseUrl: server.baseUrl },
256+
}),
257+
);
258+
259+
await expect(provider.embedQuery("hello")).resolves.toEqual(response.data[0]?.embedding);
260+
expect(server.requests[0]?.url).toBe("/v1/embeddings");
261+
expect(server.requests[0]?.body).toEqual({
262+
model: response.model ?? "embedding-model",
263+
input: ["hello"],
264+
});
265+
},
266+
);
267+
268+
it("reports missing required config with actionable keys", async () => {
269+
await expect(
270+
createOpenAICompatibleEmbeddingProvider(
271+
createOptions({ remote: { baseUrl: " " }, model: "text-embedding-bge-m3" }),
272+
),
273+
).rejects.toThrow("embedding.baseUrl");
274+
await expect(
275+
createOpenAICompatibleEmbeddingProvider(
276+
createOptions({ remote: { baseUrl: "http://127.0.0.1:11434/v1" }, model: " " }),
277+
),
278+
).rejects.toThrow("embedding.model");
279+
});
280+
281+
it("keeps remote parser failures behind the provider-specific error prefix", async () => {
282+
const server = await startEmbeddingServer({ respond: () => ({ data: [] }) });
283+
const { provider } = await createOpenAICompatibleEmbeddingProvider(
284+
createOptions({
285+
model: "text-embedding-bge-m3",
286+
remote: { baseUrl: server.baseUrl },
287+
}),
288+
);
289+
290+
await expect(provider.embedQuery("hello")).rejects.toThrow(
291+
"openai-compatible embeddings failed: malformed JSON response",
292+
);
293+
});
294+
});

0 commit comments

Comments
 (0)