Skip to content

Commit 07900ae

Browse files
authored
Merge pull request #104 from ENTERPILOT/me/gom-47-embedding-endpoints-anthropic-and-openai
feat(gom-47): implement embeddings
2 parents 110d746 + 56de25f commit 07900ae

32 files changed

Lines changed: 1256 additions & 162 deletions

DEVELOPMENT.md

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# Development
22

3+
## Prerequisites
4+
5+
Install all required development tools in one step:
6+
7+
```bash
8+
make install-tools
9+
```
10+
11+
This installs:
12+
- [golangci-lint v2](https://golangci-lint.run/welcome/install/) — required for `make lint`
13+
- [pre-commit](https://pre-commit.com/) — required for git hook setup
14+
15+
After installing tools, set up the pre-commit hooks:
16+
17+
```bash
18+
pre-commit install
19+
```
20+
321
## Testing
422

523
```bash
@@ -10,7 +28,7 @@ make test-all # All tests
1028

1129
## Linting
1230

13-
Requires [golangci-lint](https://golangci-lint.run/welcome/install/).
31+
Requires [golangci-lint v2](https://golangci-lint.run/welcome/install/)
1432

1533
```bash
1634
make lint # Check code quality
@@ -42,10 +60,3 @@ Override the auto-detection with `LOG_FORMAT`:
4260
LOG_FORMAT=text make run # force text output
4361
LOG_FORMAT=json make run # force JSON output
4462
```
45-
46-
## Pre-commit
47-
48-
```bash
49-
pip install pre-commit
50-
pre-commit install
51-
```

GETTING_STARTED.md

Lines changed: 89 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,13 @@ providers:
114114
115115
**Effective resilience per provider:**
116116
117-
| Provider | max_retries | failure_threshold | cb timeout |
118-
|-----------|-------------|-------------------|------------|
119-
| openai | 2 (global) | 3 (global) | 15s (global) |
120-
| anthropic | **5** (override) | 3 (global) | 15s (global) |
121-
| ollama | 2 (global) | **10** (override) | **5s** (override) |
117+
118+
| Provider | max_retries | failure_threshold | cb timeout |
119+
| --------- | ---------------- | ----------------- | ----------------- |
120+
| openai | 2 (global) | 3 (global) | 15s (global) |
121+
| anthropic | **5** (override) | 3 (global) | 15s (global) |
122+
| ollama | 2 (global) | **10** (override) | **5s** (override) |
123+
122124
123125
Only fields that are explicitly listed under a provider's `resilience:` block are overridden. Everything else silently inherits from the global section.
124126

@@ -161,33 +163,37 @@ GROQ_API_KEY=gsk_...
161163

162164
All resilience settings can be overridden at runtime via env vars. Env vars always beat both code defaults and YAML values.
163165

164-
| Variable | Type | Default | Description |
165-
|---|---|---|---|
166-
| `RETRY_MAX_RETRIES` | int | `3` | Maximum retry attempts per request |
167-
| `RETRY_INITIAL_BACKOFF` | duration | `1s` | First retry wait (e.g. `500ms`, `2s`) |
168-
| `RETRY_MAX_BACKOFF` | duration | `30s` | Upper cap on retry wait |
169-
| `RETRY_BACKOFF_FACTOR` | float | `2.0` | Exponential multiplier between retries |
170-
| `RETRY_JITTER_FACTOR` | float | `0.1` | Random jitter as a fraction of the backoff |
171-
| `CIRCUIT_BREAKER_FAILURE_THRESHOLD` | int | `5` | Consecutive failures before opening |
172-
| `CIRCUIT_BREAKER_SUCCESS_THRESHOLD` | int | `2` | Consecutive successes to close again |
173-
| `CIRCUIT_BREAKER_TIMEOUT` | duration | `30s` | How long the circuit stays open |
174-
| `LOG_FORMAT` | string | _(unset)_ | Auto-detects based on environment: colorized text on a TTY, JSON otherwise. Set to `text` to force human-readable output (no colors if not a TTY), or `json` to force structured JSON even on a TTY (recommended for production, CloudWatch, Datadog, GCP). |
166+
167+
| Variable | Type | Default | Description |
168+
| ----------------------------------- | -------- | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
169+
| `RETRY_MAX_RETRIES` | int | `3` | Maximum retry attempts per request |
170+
| `RETRY_INITIAL_BACKOFF` | duration | `1s` | First retry wait (e.g. `500ms`, `2s`) |
171+
| `RETRY_MAX_BACKOFF` | duration | `30s` | Upper cap on retry wait |
172+
| `RETRY_BACKOFF_FACTOR` | float | `2.0` | Exponential multiplier between retries |
173+
| `RETRY_JITTER_FACTOR` | float | `0.1` | Random jitter as a fraction of the backoff |
174+
| `CIRCUIT_BREAKER_FAILURE_THRESHOLD` | int | `5` | Consecutive failures before opening |
175+
| `CIRCUIT_BREAKER_SUCCESS_THRESHOLD` | int | `2` | Consecutive successes to close again |
176+
| `CIRCUIT_BREAKER_TIMEOUT` | duration | `30s` | How long the circuit stays open |
177+
| `LOG_FORMAT` | string | *(unset)* | Auto-detects based on environment: colorized text on a TTY, JSON otherwise. Set to `text` to force human-readable output (no colors if not a TTY), or `json` to force structured JSON even on a TTY (recommended for production, CloudWatch, Datadog, GCP). |
178+
175179

176180
Provider credentials:
177181

178-
| Variable | Provider |
179-
|---|---|
180-
| `OPENAI_API_KEY` | OpenAI |
181-
| `OPENAI_BASE_URL` | OpenAI (custom endpoint) |
182-
| `ANTHROPIC_API_KEY` | Anthropic |
183-
| `ANTHROPIC_BASE_URL` | Anthropic (custom endpoint) |
184-
| `GEMINI_API_KEY` | Google Gemini |
185-
| `GEMINI_BASE_URL` | Gemini (custom endpoint) |
186-
| `XAI_API_KEY` | xAI / Grok |
187-
| `XAI_BASE_URL` | xAI (custom endpoint) |
188-
| `GROQ_API_KEY` | Groq |
189-
| `GROQ_BASE_URL` | Groq (custom endpoint) |
190-
| `OLLAMA_BASE_URL` | Ollama (default: `http://localhost:11434/v1`) |
182+
183+
| Variable | Provider |
184+
| -------------------- | --------------------------------------------- |
185+
| `OPENAI_API_KEY` | OpenAI |
186+
| `OPENAI_BASE_URL` | OpenAI (custom endpoint) |
187+
| `ANTHROPIC_API_KEY` | Anthropic |
188+
| `ANTHROPIC_BASE_URL` | Anthropic (custom endpoint) |
189+
| `GEMINI_API_KEY` | Google Gemini |
190+
| `GEMINI_BASE_URL` | Gemini (custom endpoint) |
191+
| `XAI_API_KEY` | xAI / Grok |
192+
| `XAI_BASE_URL` | xAI (custom endpoint) |
193+
| `GROQ_API_KEY` | Groq |
194+
| `GROQ_BASE_URL` | Groq (custom endpoint) |
195+
| `OLLAMA_BASE_URL` | Ollama (default: `http://localhost:11434/v1`) |
196+
191197

192198
See `.env.template` for the full list of all configurable environment variables.
193199

@@ -409,6 +415,44 @@ curl http://localhost:8080/v1/responses \
409415
}'
410416
```
411417

418+
### Embeddings
419+
420+
#### Basic Embedding
421+
422+
```bash
423+
curl http://localhost:8080/v1/embeddings \
424+
-H "Content-Type: application/json" \
425+
-d '{
426+
"model": "text-embedding-3-small",
427+
"input": "The quick brown fox jumps over the lazy dog."
428+
}'
429+
```
430+
431+
#### Batch Embedding (multiple inputs)
432+
433+
```bash
434+
curl http://localhost:8080/v1/embeddings \
435+
-H "Content-Type: application/json" \
436+
-d '{
437+
"model": "text-embedding-3-small",
438+
"input": ["First sentence", "Second sentence", "Third sentence"]
439+
}'
440+
```
441+
442+
#### With Custom Dimensions
443+
444+
```bash
445+
curl http://localhost:8080/v1/embeddings \
446+
-H "Content-Type: application/json" \
447+
-d '{
448+
"model": "text-embedding-3-large",
449+
"input": "Hello world",
450+
"dimensions": 512
451+
}'
452+
```
453+
454+
Supported by: OpenAI, Gemini, Groq, xAI, Ollama. Anthropic does not support embeddings natively.
455+
412456
### List Available Models
413457

414458
```bash
@@ -480,6 +524,13 @@ stream = client.chat.completions.create(
480524
for chunk in stream:
481525
if chunk.choices[0].delta.content:
482526
print(chunk.choices[0].delta.content, end="")
527+
528+
# Embeddings
529+
embedding = client.embeddings.create(
530+
model="text-embedding-3-small",
531+
input="Hello world"
532+
)
533+
print(embedding.data[0].embedding[:5]) # first 5 dimensions
483534
```
484535

485536
### Node.js
@@ -510,6 +561,13 @@ for await (const chunk of stream) {
510561
process.stdout.write(chunk.choices[0].delta.content);
511562
}
512563
}
564+
565+
// Embeddings
566+
const embedding = await client.embeddings.create({
567+
model: "text-embedding-3-small",
568+
input: "Hello world",
569+
});
570+
console.log(embedding.data[0].embedding.slice(0, 5)); // first 5 dimensions
513571
```
514572

515573
---
@@ -554,13 +612,10 @@ for await (const chunk of stream) {
554612
## Tips
555613

556614
1. **Model routing**: The gateway automatically routes requests to the correct provider based on the model name — no configuration needed. Just use any model name from the list above.
557-
558615
2. **API compatibility**: The gateway exposes an OpenAI-compatible API. Existing OpenAI client libraries work unchanged for all providers.
559-
560616
3. **Streaming**: All providers support streaming. The gateway normalises provider-specific formats to OpenAI's SSE format.
561-
562617
4. **System messages**: Anthropic's system message format is handled automatically. Gemini uses Google's OpenAI-compatible endpoint, which also handles system messages natively.
563-
564618
5. **Max tokens**: Anthropic requires `max_tokens` to be set. If not provided, the gateway defaults to 4096. OpenAI and Gemini treat it as optional.
565-
566619
6. **Responses API**: The `/v1/responses` endpoint provides a unified interface across all providers. Providers that do not natively support the Responses API convert requests internally.
620+
7. **Embeddings**: The `/v1/embeddings` endpoint is supported by OpenAI, Gemini, Groq, xAI, and Ollama. Anthropic does not offer embeddings natively.
621+

Makefile

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
.PHONY: build run clean tidy test test-e2e test-integration test-contract test-all lint lint-fix record-api swagger
1+
.PHONY: all build run clean tidy test test-e2e test-integration test-contract test-all lint lint-fix record-api swagger install-tools
2+
3+
all: build
24

35
# Get version info
46
VERSION ?= $(shell git describe --tags --always --dirty)
@@ -10,6 +12,11 @@ LDFLAGS := -X "gomodel/internal/version.Version=$(VERSION)" \
1012
-X "gomodel/internal/version.Commit=$(COMMIT)" \
1113
-X "gomodel/internal/version.Date=$(DATE)"
1214

15+
install-tools:
16+
@command -v golangci-lint > /dev/null 2>&1 || (echo "Installing golangci-lint..." && go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.10.1)
17+
@command -v pre-commit > /dev/null 2>&1 || (echo "Installing pre-commit..." && pip install pre-commit==4.5.1)
18+
@echo "All tools are ready"
19+
1320
build:
1421
go build -ldflags '$(LDFLAGS)' -o bin/gomodel ./cmd/gomodel
1522
# Run the application

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -70,41 +70,41 @@ Example model identifiers are illustrative and subject to change; consult provid
7070
<td>OpenAI</td>
7171
<td><code>OPENAI_API_KEY</code></td>
7272
<td><code>gpt&#8209;4o&#8209;mini</code></td>
73-
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td>
73+
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td></td><td>🚧</td>
7474
</tr>
7575
<tr>
7676
<td>Anthropic</td>
7777
<td><code>ANTHROPIC_API_KEY</code></td>
7878
<td><code>claude&#8209;sonnet&#8209;4&#8209;20250514</code></td>
79-
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td>
79+
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td></td><td>🚧</td>
8080
</tr>
8181
<tr>
8282
<td>Google&nbsp;Gemini</td>
8383
<td><code>GEMINI_API_KEY</code></td>
8484
<td><code>gemini&#8209;2.5&#8209;flash</code></td>
85-
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td>
85+
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td></td><td>🚧</td>
8686
</tr>
8787
<tr>
8888
<td>Groq</td>
8989
<td><code>GROQ_API_KEY</code></td>
9090
<td><code>llama&#8209;3.3&#8209;70b&#8209;versatile</code></td>
91-
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td>
91+
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td></td><td>🚧</td>
9292
</tr>
9393
<tr>
9494
<td>xAI&nbsp;(Grok)</td>
9595
<td><code>XAI_API_KEY</code></td>
9696
<td><code>grok&#8209;2</code></td>
97-
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td>
97+
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td></td><td>🚧</td>
9898
</tr>
9999
<tr>
100100
<td>Ollama</td>
101101
<td><code>OLLAMA_BASE_URL</code></td>
102102
<td><code>llama3.2</code></td>
103-
<td>✅</td><td>🚧</td><td>🚧</td><td></td><td></td><td>🚧</td><td>🚧</td><td>🚧</td>
103+
<td>✅</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td>🚧</td><td></td><td>🚧</td>
104104
</tr>
105105
</table>
106106

107-
✅ Supported 🚧 Coming soon — Not applicable
107+
✅ Supported 🚧 Coming soon ❌ Unsupported
108108

109109
---
110110

@@ -159,6 +159,7 @@ docker run --rm -p 8080:8080 --env-file .env gomodel
159159
|----------|--------|-------------|
160160
| `/v1/chat/completions` | POST | Chat completions (streaming supported) |
161161
| `/v1/responses` | POST | OpenAI Responses API |
162+
| `/v1/embeddings` | POST | Text embeddings |
162163
| `/v1/models` | GET | List available models |
163164
| `/health` | GET | Health check |
164165
| `/metrics` | GET | Prometheus metrics (when enabled) |

internal/admin/handler_test.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,10 @@ func (m *handlerMockProvider) StreamResponses(_ context.Context, _ *core.Respons
7878
return nil, nil
7979
}
8080

81+
func (m *handlerMockProvider) Embeddings(_ context.Context, _ *core.EmbeddingRequest) (*core.EmbeddingResponse, error) {
82+
return nil, core.NewInvalidRequestError("not supported", nil)
83+
}
84+
8185
func newHandlerContext(path string) (echo.Context, *httptest.ResponseRecorder) {
8286
e := echo.New()
8387
req := httptest.NewRequest(http.MethodGet, path, nil)

internal/auditlog/stream_wrapper.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,7 @@ func IsModelInteractionPath(path string) bool {
426426
modelPaths := []string{
427427
"/v1/chat/completions",
428428
"/v1/responses",
429+
"/v1/embeddings",
429430
}
430431
for _, p := range modelPaths {
431432
if strings.HasPrefix(path, p) {

internal/core/interfaces.go

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ type Provider interface {
2222

2323
// StreamResponses returns a raw SSE stream for Responses API (caller must close)
2424
StreamResponses(ctx context.Context, req *ResponsesRequest) (io.ReadCloser, error)
25+
26+
// Embeddings sends an embeddings request to the provider
27+
Embeddings(ctx context.Context, req *EmbeddingRequest) (*EmbeddingResponse, error)
2528
}
2629

2730
// RoutableProvider extends Provider with routing capability.
@@ -30,11 +33,7 @@ type Provider interface {
3033
type RoutableProvider interface {
3134
Provider
3235

33-
// Supports returns true if the provider can handle the given model
3436
Supports(model string) bool
35-
36-
// GetProviderType returns the provider type string for the given model.
37-
// Returns empty string if the model is not found.
3837
GetProviderType(model string) string
3938
}
4039

internal/core/types.go

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
package core
22

3+
import "encoding/json"
4+
35
// StreamOptions controls streaming behavior options.
46
// This is used to request usage data in streaming responses.
57
type StreamOptions struct {
@@ -212,3 +214,34 @@ type ModelsResponse struct {
212214
Object string `json:"object"`
213215
Data []Model `json:"data"`
214216
}
217+
218+
// EmbeddingRequest represents the incoming embeddings request (OpenAI-compatible).
219+
type EmbeddingRequest struct {
220+
Model string `json:"model"`
221+
Input any `json:"input"`
222+
EncodingFormat string `json:"encoding_format,omitempty"`
223+
Dimensions *int `json:"dimensions,omitempty"`
224+
}
225+
226+
// EmbeddingResponse represents the embeddings response (OpenAI-compatible).
227+
type EmbeddingResponse struct {
228+
Object string `json:"object"`
229+
Data []EmbeddingData `json:"data"`
230+
Model string `json:"model"`
231+
Provider string `json:"provider"`
232+
Usage EmbeddingUsage `json:"usage"`
233+
}
234+
235+
// EmbeddingData represents a single embedding data point.
236+
// Embedding is json.RawMessage to support both float arrays and base64-encoded strings.
237+
type EmbeddingData struct {
238+
Object string `json:"object"`
239+
Embedding json.RawMessage `json:"embedding"`
240+
Index int `json:"index"`
241+
}
242+
243+
// EmbeddingUsage represents token usage information for embeddings.
244+
type EmbeddingUsage struct {
245+
PromptTokens int `json:"prompt_tokens"`
246+
TotalTokens int `json:"total_tokens"`
247+
}

internal/guardrails/provider.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,11 @@ func (g *GuardedProvider) ListModels(ctx context.Context) (*core.ModelsResponse,
5959
return g.inner.ListModels(ctx)
6060
}
6161

62+
// Embeddings delegates directly to the inner provider (no guardrails needed for embeddings).
63+
func (g *GuardedProvider) Embeddings(ctx context.Context, req *core.EmbeddingRequest) (*core.EmbeddingResponse, error) {
64+
return g.inner.Embeddings(ctx, req)
65+
}
66+
6267
// Responses extracts messages, applies guardrails, then routes the request.
6368
func (g *GuardedProvider) Responses(ctx context.Context, req *core.ResponsesRequest) (*core.ResponsesResponse, error) {
6469
modified, err := g.processResponses(ctx, req)

0 commit comments

Comments
 (0)