Allow configurable embedding model name + custom base URL for self-hosted embeddings

## Summary

Honcho's embedding client (`src/embedding_client.py`) hardcodes both the **model name** and, for two of three providers, the **base URL**. That makes it impossible to use a self-hosted embedding endpoint (e.g. Ollama with `bge-m3`, llama.cpp, TEI, Infinity) without forking the repo or deploying a translation proxy.

## Current state

Three providers supported:

| Provider | Model (hardcoded) | Base URL |
|---|---|---|
| `openai` | `text-embedding-3-small` | `api.openai.com` (no override) |
| `gemini` | `gemini-embedding-001` | Google (no override) |
| `openrouter` | `openai/text-embedding-3-small` | honors `LLM_OPENAI_COMPATIBLE_BASE_URL` |

The `openrouter` path is the closest to self-hosted-friendly because it accepts a custom base URL, but it still sends a fixed model string (`openai/text-embedding-3-small`) that a local embedder typically doesn't recognize.

## Why this matters

1. **Sovereignty / cost** — operators running Honcho alongside their own LLM stack (e.g. Ollama on a Jetson) want to keep embedding workloads local for both cost and data-residency reasons.
2. **Model flexibility** — some domains need specific embedders (multilingual, code, long-context). `text-embedding-3-small` and `gemini-embedding-001` are reasonable defaults but not universally optimal.
3. **Consistency with chat provider config** — Honcho already lets users set `DERIVER_PROVIDER`, `DERIVER_MODEL`, `DIALECTIC_PROVIDER`, `DIALECTIC_MODEL`, `DREAM_PROVIDER`, `DREAM_MODEL`, including a `custom` provider with `LLM_OPENAI_COMPATIBLE_BASE_URL` for chat. Embeddings are the odd one out.

## Proposed

Add these env vars (matching the chat-side naming convention):

- `LLM_EMBEDDING_MODEL` — free-form model string; default to current hardcoded value per provider
- `LLM_EMBEDDING_BASE_URL` — custom base URL override; applies regardless of provider when set
- Accept `custom` as a new `EMBEDDING_PROVIDER` value (mirrors the chat-side `custom` provider) that uses `LLM_EMBEDDING_BASE_URL` + `LLM_EMBEDDING_MODEL` + reuses `LLM_OPENAI_COMPATIBLE_API_KEY` for auth.

Minimal code change in `_EmbeddingClient.__init__` — thread `api_key`, `base_url`, `model` through the existing provider branches rather than hardcoding them.

## Workaround today

Operators are left writing a proxy service that accepts the OpenAI embeddings shape, rewrites the hardcoded model name, and forwards to their embedder. Happy to contribute a PR if the design above is acceptable.

## Environment

Honcho **3.0.6** (image `ghcr.io/plastic-labs/honcho:latest`) — self-hosted in K3s behind an existing `LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.x.ai/v1` for chat providers.

Provider	Model (hardcoded)	Base URL
`openai`	`text-embedding-3-small`	`api.openai.com` (no override)
`gemini`	`gemini-embedding-001`	Google (no override)
`openrouter`	`openai/text-embedding-3-small`	honors `LLM_OPENAI_COMPATIBLE_BASE_URL`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configurable embedding model name + custom base URL for self-hosted embeddings #578

Summary

Current state

Why this matters

Proposed

Workaround today

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Allow configurable embedding model name + custom base URL for self-hosted embeddings #578

Description

Summary

Current state

Why this matters

Proposed

Workaround today

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions